LLM Resource Usage & Cost Calculator

Analyse memory requirements, throughput and cloud spend for open-source language models in seconds.

Curated presets

Start from modern open-weight models with known serving notes.

6 presets
Parameters
7.000 B
Exact / sourced value
Hidden size
4096
Layers
32

Quick estimator

Size weights and KV cache instantly from core workload inputs.

Weights/KV: exact arithmeticActivations/overhead: heuristic
Concurrent users
1

KV cache memory scales linearly with concurrent sequences and context length.

Architecture assumptions

Using LLaMA-style scaling heuristics derived from parameter count. Enable manual mode to match a specific checkpoint.

Heuristic architecture
Hidden size
4096
Layers
32
Attention heads
32
Feed-forward size
16384

Hidden size and depth follow public LLaMA scaling heuristics. Switch to manual mode to match custom architectures.

Memory & hardware

Assumes 1 concurrent user at 4096 tokens.

Weight memory and KV cache are exact arithmetic from your inputs. Activations, optimizer state, total VRAM, fit checks, and GPU recommendations are heuristic estimates because they depend on runtime behavior and overhead assumptions.

Model weights13.04 GB
Activations2.61 GB
KV cache16-bit2.00 GB
Total before overhead17.65 GB
Framework overhead (1.15×)2.65 GB
Total VRAM needed20.29 GB

Fits with 11.71 GB headroom.

Closest matching GPUs

  • NVIDIA GeForce RTX 30903.71 GB spare
  • NVIDIA GeForce RTX 40903.71 GB spare
  • NVIDIA GeForce RTX 509011.71 GB spare
  • NVIDIA A100 PCIe 40GB19.71 GB spare
  • NVIDIA RTX 6000 Ada27.71 GB spare

Performance

All performance outputs below are heuristic estimates. They use a simplified compute model and default to total parameters when no active-parameter metadata is available.

GPU FP32 throughput: 104.9 TFLOPs

Adjust the efficiency multiplier to reflect framework and kernel optimisations.

Estimated FLOPs / sequence: 27.46 TFLOPs

Tokens per second: 2,247.86

Milliseconds per token: 0.44

Cloud cost projection

Cloud cost is exact arithmetic from hourly rate × runtime. It does not include hidden provider fees outside the selected rate.

Effective hourly rate: $1.01

Projected cost: $1.01