Version: 1.0.0
Last Updated: 2026-01-11
Bit-TTT Engine is a high-performance language model implementation combining:
cortex_rust::layers)| Layer | Purpose | Parameters |
|---|---|---|
RMSNorm |
Root Mean Square Normalization | dim, eps |
BitLinear |
1.58-bit quantized linear layer | in_dim, out_dim |
SwiGLU |
Gated MLP with SiLU activation | hidden_dim, intermediate_dim |
TTTLayer |
Test-Time Training layer | hidden_dim, inner_lr |
cortex_rust::model)| Component | Description |
|---|---|
BitLlamaConfig |
Model configuration (vocab_size, hidden_dim, num_layers, inner_lr) |
BitLlamaBlock |
Single transformer block: Norm → TTT → Norm → MLP |
BitLlama |
Full model with embedding, N blocks, and LM head |
Llama |
High-level API with tokenizer and state management |
bit_llama::train)| Module | Responsibility |
|---|---|
args.rs |
CLI argument parsing (dim, layers, lr, steps, etc.) |
checkpoint.rs |
Training state persistence (save/load) |
training_loop.rs |
Main training loop with cosine LR schedule |
Input Text
│
▼
┌─────────────────┐
│ Tokenizer │ → Token IDs [u32]
└─────────────────┘
│
▼
┌─────────────────┐
│ Embedding │ → Hidden States (B, T, D)
└─────────────────┘
│
▼ (× N layers)
┌─────────────────┐
│ BitLlamaBlock │
│ ├─ RMSNorm │
│ ├─ TTTLayer │ → Online weight update
│ ├─ RMSNorm │
│ └─ SwiGLU │
└─────────────────┘
│
▼
┌─────────────────┐
│ RMSNorm │
│ LM Head │ → Logits (B, T, V)
└─────────────────┘
│
▼
┌─────────────────┐
│ Sampling │ → Next Token
└─────────────────┘
.safetensors)Standard safetensors format with weight names:
embed.weightlayers.{i}.norm1.weightlayers.{i}.ttt.down.weightlayers.{i}.ttt.up.weightlayers.{i}.norm2.weightlayers.{i}.mlp.gate_proj.weightlayers.{i}.mlp.down_proj.weightlayers.{i}.mlp.up_proj.weightnorm_f.weightlm_head.weightconfig.json){
"vocab_size": 16384,
"hidden_dim": 256,
"num_layers": 8,
"inner_lr": 0.1
}
.bitt)Single-file format containing:
BITT (4 bytes)| Parameter | Default | Description |
|---|---|---|
dim |
256 | Model hidden dimension |
layers |
8 | Number of transformer blocks |
context_len |
128 | Maximum context length |
batch_size |
16 | Training batch size |
lr |
3e-4 | Peak learning rate |
warmup_steps |
100 | LR warmup steps |
min_lr |
1e-5 | Minimum learning rate |
save_interval |
500 | Checkpoint save frequency |
| Configuration | Minimum VRAM | Recommended |
|---|---|---|
| 256-dim, 8-layer | 2 GB | 4 GB |
| 512-dim, 12-layer | 4 GB | 8 GB |
| 1024-dim, 24-layer | 8 GB | 16 GB |
use cortex_rust::{BitLlama, BitLlamaConfig, Llama};
// Load model
let llama = Llama::load_auto("models/my_model")?;
// Stream completion
llama.stream_completion("Hello", 100, 0.8, |token| {
print!("{}", token);
Ok(true)
})?;
import cortex_rust
config = cortex_rust.BitLlamaConfig(16384, 256, 8, 0.1)
model = cortex_rust.BitLlama(config, "model.safetensors", device="cuda")
logits = model.forward(token_id=42)
Bit-TTT Engine Specification v1.0.0