If you were running a LLM locally on android through llama.cpp for use as a private personal assistant. What model would you use?
Thanks for any recommendations in advance.
If you were running a LLM locally on android through llama.cpp for use as a private personal assistant. What model would you use?
Thanks for any recommendations in advance.
It very much depends on your phone hardware, RAM affects how big models can be and CPU affects how fast you’ll get the replies. I’ve successfully ran 4B models on my 8GB RAM phone, but since it’s the usual server and client setup which needs full internet access due to the lack of granular permissions on Android (Even AIO setups needs open ports to connect to itself) I prefer a proper home server. Which, with a cheap GFX card, is indescribably faster and more capable.
I was honestly impressed with the speed and accuracy I was getting with Deepseek, llama, and Gemma on my 1660ti.
$100 used and it was seconds to get responses.