LLMfit

157 models. 30 providers. One command.

Detects your hardware, scores every model across quality, speed, and fit, then tells you exactly which ones will run well on your machine.

Quick install
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
or cargo install llmfit or brew install AlexsJones/llmfit/llmfit
157
Models
30
Providers
6
GPU Backends
4
Scoring Dimensions
Features
Hardware Detection
Auto-detects RAM, CPU cores, and GPU. Supports NVIDIA (multi-GPU), AMD ROCm, Intel Arc, and Apple Silicon unified memory.
Multi-dimensional Scoring
Every model scored on Quality, Speed, Fit, and Context. Weights adapt to use case: coding prioritizes quality, chat prioritizes speed.
Multi-GPU & MoE
Aggregates VRAM across multiple GPUs. Mixture-of-Experts models scored by active parameters, not total -- Mixtral 8x7B needs 6.6 GB, not 24.
Interactive TUI
Full terminal UI with search, filtering by fit level and provider, model details, and direct Ollama integration for downloading models.
Dynamic Quantization
Picks the best quality quantization that fits your hardware. Walks from Q8_0 down to Q2_K, maximizing quality within memory limits.
Ollama Integration
See which models you already have installed. Pull new ones directly from the TUI. Supports remote Ollama instances via OLLAMA_HOST.
See it in action
llmfit -- 157 models
System: Apple M2 Pro | 32 GB unified | Metal | Ollama: 8 installed

  #  Score  Model                          Params  Quant   tok/s  Fit       VRAM
  ── ─────  ─────────────────────────────  ──────  ──────  ─────  ────────  ────
  1  92     Qwen3-8B                      8.2B    Q8_0    38.2   Perfect   8.4 GB
  2  89     Llama-3.1-8B-Instruct         8.0B    Q8_0    36.5   Perfect   8.2 GB
  3  87     Mistral-7B-Instruct-v0.3      7.2B    Q8_0    40.1   Perfect   7.4 GB
  4  85     Gemma-3-12b-it                12B     Q6_K    28.7   Perfect   12.6 GB
  5  83     Mixtral-8x7B (MoE)            46.7B   Q4_K_M  22.4   Good      6.6 GB
  6  81     Qwen2.5-Coder-14B-Instruct   14.8B   Q4_K_M  25.3   Perfect   14.2 GB
  7  78     Mistral-Small-24B             24B     Q4_K_M  18.1   Good      18.4 GB
  8  74     Qwen3-32B                     32.8B   Q4_K_M  12.8   Marginal  24.6 GB
  9  71     Llama-3.3-70B-Instruct        70.6B   Q2_K    6.2    Too Tight 38.4 GB
 10  68     DeepSeek-R1 (MoE)             671B    Q2_K    --     Too Tight 186 GB

  Filter: All | /search | f:fit | p:provider | d:download | q:quit
How it works
01
Detect
Reads RAM, CPU cores, and probes every GPU. Multi-GPU VRAM is aggregated. Backend auto-identified.
02
Quantize
For each model, walks from Q8_0 to Q2_K and picks the highest quality quantization that fits your memory.
03
Score
Quality, Speed, Fit, and Context dimensions scored 0-100. Weights shift by use case (coding, chat, reasoning).
04
Recommend
Models ranked by composite score. Perfect fits at top, unrunnable at bottom. Download directly via Ollama.
30 Providers
Meta Llama
Mistral AI
Alibaba Qwen
Google Gemma
Microsoft Phi
DeepSeek
IBM Granite
xAI Grok
Cohere
BigCode
01.ai Yi
Upstage SOLAR
TII Falcon
HuggingFace
Zhipu GLM
Moonshot Kimi
Baidu ERNIE
Allen Institute
LMSYS Vicuna
NousResearch
Stability AI
BigScience
WizardLM
OpenChat
Nomic
BAAI
Ant Group
Rednote
Meituan
Community

Stop guessing. Start running.

One command tells you which models fit your hardware. Written in Rust. Zero runtime dependencies. Works offline.

View on GitHub crates.io