Local Gemma Chat

Model settings

Model preset Start small on iPhone, then move down the list toward Gemma 4.

Runtime WebLLM uses MLC's sharded WebGPU runtime and lower iOS memory settings.

WebLLM model ID Default iOS-safe trial: Llama 3.2 1B q4 with 1k context override.

Model URL Use a web-compatible *-web.task or *-Web.litertlm file with CORS enabled.

Max tokens Higher values allow longer answers and chat history; lower is safer on mobile.

Checking model cache…

Idle

If the model is already cached by this app, loading can start from local storage instead of downloading again.