Open tools for running local AI
Manager for TurboQuant+ by TheTom — run local LLMs with auto-configured KV cache compression
turbo4 compresses V cache to 4.25 bits — 8x more context in the same RAM.
Detects GPU, RAM, quant type. Calculates optimal cache settings automatically.
/v1/chat/completions. Works with any client — curl, Pi, LM Studio.
Stops after 5 min idle. Like Ollama — no wasted resources.
| Without | With tq | |
|---|---|---|
| 8GB RAM, 3B model | 4,096 context | 32,768 context |
| KV cache memory | f16 (full size) | 3.8× compressed |
| Config | Manual flags | Auto-detected |
| Idle behavior | Always running | Auto-stops |
curl -fsSL https://wondermotor.com/install.sh | bashtq listtq search "qwen2.5 coder 7b" then tq download <model>tq serve 1 — TurboQuant auto-configured.| Command | Description |
|---|---|
tq list | List local GGUF models |
tq search <query> | Search HuggingFace |
tq download <model> | Download with SHA256 verify |
tq serve 1 | Launch with auto TQ config |
tq serve 1 --dry-run | Preview command |
tq status | Check running server |
tq stop | Stop the server |
tq logs | View server logs |
tq install | Install TurboQuant+ binary |
tq doctor | Verify setup |
tq config show | Show/edit config |
Wondermotor AI is building more open tools for local AI. Stay tuned.