Local Gemma Chat
Ask questions and get local AI replies in your browser using WebGPU. The model runs on-device after it loads.
Checking browser support…
Model settings
Start small on iPhone, then move down the list toward Gemma 4.
WebLLM uses MLC's sharded WebGPU runtime and lower iOS memory settings.
Default iOS-safe trial: Llama 3.2 1B q4 with 1k context override.
Use a web-compatible
*-web.task or *-Web.litertlm file with CORS enabled.
Higher values allow longer answers and chat history; lower is safer on mobile.
Checking model cache…
If the model is already cached by this app, loading can start from local storage instead of downloading again.
Idle