LLMfit

157 models. 30 providers. One command.

Detects your hardware, scores every model across quality, speed, and fit, then tells you exactly which ones will run well on your machine.

Quick install

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

or cargo install llmfit or brew install AlexsJones/llmfit/llmfit

157

Models

Providers

GPU Backends

Scoring Dimensions

Features

Hardware Detection

Auto-detects RAM, CPU cores, and GPU. Supports NVIDIA (multi-GPU), AMD ROCm, Intel Arc, and Apple Silicon unified memory.

Multi-dimensional Scoring

Every model scored on Quality, Speed, Fit, and Context. Weights adapt to use case: coding prioritizes quality, chat prioritizes speed.

Multi-GPU & MoE

Aggregates VRAM across multiple GPUs. Mixture-of-Experts models scored by active parameters, not total -- Mixtral 8x7B needs 6.6 GB, not 24.

Interactive TUI

Full terminal UI with search, filtering by fit level and provider, model details, and direct Ollama integration for downloading models.

Dynamic Quantization

Picks the best quality quantization that fits your hardware. Walks from Q8_0 down to Q2_K, maximizing quality within memory limits.

Ollama Integration

See which models you already have installed. Pull new ones directly from the TUI. Supports remote Ollama instances via OLLAMA_HOST.

See it in action

llmfit -- 157 models

System: Apple M2 Pro | 32 GB unified | Metal | Ollama: 8 installed

  #  Score  Model                          Params  Quant   tok/s  Fit       VRAM
  ── ─────  ─────────────────────────────  ──────  ──────  ─────  ────────  ────
  1  92     Qwen3-8B                      8.2B    Q8_0    38.2   Perfect   8.4 GB
  2  89     Llama-3.1-8B-Instruct         8.0B    Q8_0    36.5   Perfect   8.2 GB
  3  87     Mistral-7B-Instruct-v0.3      7.2B    Q8_0    40.1   Perfect   7.4 GB
  4  85     Gemma-3-12b-it                12B     Q6_K    28.7   Perfect   12.6 GB
  5  83     Mixtral-8x7B (MoE)            46.7B   Q4_K_M  22.4   Good      6.6 GB
  6  81     Qwen2.5-Coder-14B-Instruct   14.8B   Q4_K_M  25.3   Perfect   14.2 GB
  7  78     Mistral-Small-24B             24B     Q4_K_M  18.1   Good      18.4 GB
  8  74     Qwen3-32B                     32.8B   Q4_K_M  12.8   Marginal  24.6 GB
  9  71     Llama-3.3-70B-Instruct        70.6B   Q2_K    6.2    Too Tight 38.4 GB
 10  68     DeepSeek-R1 (MoE)             671B    Q2_K    --     Too Tight 186 GB

  Filter: All | /search | f:fit | p:provider | d:download | q:quit

How it works

Detect

Reads RAM, CPU cores, and probes every GPU. Multi-GPU VRAM is aggregated. Backend auto-identified.

Quantize

For each model, walks from Q8_0 to Q2_K and picks the highest quality quantization that fits your memory.

Score

Quality, Speed, Fit, and Context dimensions scored 0-100. Weights shift by use case (coding, chat, reasoning).

Recommend

Models ranked by composite score. Perfect fits at top, unrunnable at bottom. Download directly via Ollama.

30 Providers

Meta Llama

Mistral AI

Alibaba Qwen

Google Gemma

Microsoft Phi

DeepSeek

IBM Granite

xAI Grok

Cohere

BigCode

01.ai Yi

Upstage SOLAR

TII Falcon

HuggingFace

Zhipu GLM

Moonshot Kimi

Baidu ERNIE

Allen Institute

LMSYS Vicuna

NousResearch

Stability AI

BigScience

WizardLM

OpenChat

Nomic

BAAI

Ant Group

Rednote

Meituan

Community

Stop guessing. Start running.

One command tells you which models fit your hardware. Written in Rust. Zero runtime dependencies. Works offline.

View on GitHub crates.io