1.58-bit Quantization + Test-Time Training (TTT) Implementation in Pure Rust.
Bit-TTT Engine combines two cutting-edge techniques:
The goal: Run 70B parameter models on 8-16GB VRAM with efficient inference.
| Feature | Status | Description |
|---|---|---|
Core Engine (cortex_rust) |
✅ Complete | Candle-based neural network implementation |
| Training Pipeline | ✅ Complete | End-to-end training in pure Rust |
| Streaming Inference | ✅ Complete | ~1100 tokens/sec on CPU |
| GUI Trainer | ✅ Complete | Tauri-based visual training interface |
| Python Bindings (PyO3) | ✅ Complete | Optional Python integration |
| Japanese Tokenizer | 🚧 Planned | Phase 14 |
| 7B/70B Scaling | 🚧 Planned | Phase 15 |
| WASM/Browser Support | 🚧 Planned | Phase 16 |
Bit-TTT Engine
├── crates/
│ ├── rust_engine/ # Core library (cortex_rust)
│ │ ├── layers/ # Neural network layers
│ │ │ ├── rms_norm.rs # RMS Normalization
│ │ │ ├── bit_linear.rs # 1.58-bit Linear Layer
│ │ │ ├── swiglu.rs # SwiGLU MLP
│ │ │ └── ttt.rs # TTT Layer
│ │ ├── model/ # Model architecture
│ │ │ ├── block.rs # Transformer Block
│ │ │ ├── llama.rs # BitLlama Model
│ │ │ └── config.rs # Configuration
│ │ ├── python.rs # PyO3 bindings
│ │ └── lib.rs # Public API
│ │
│ └── bit_llama/ # CLI application
│ ├── train/ # Training module
│ │ ├── args.rs # CLI arguments
│ │ ├── checkpoint.rs # State management
│ │ └── training_loop.rs # Main loop
│ ├── gui/ # Tauri GUI
│ └── inference.rs # Inference engine
│
├── models/ # Trained model checkpoints
├── data/ # Training datasets
└── tools/ # Utility scripts
git clone https://github.com/imonoonoko/Bit-TTT-Engine.git
cd Bit-TTT-Engine
cargo build --release
# Using launch script (Windows)
./launch_trainer.bat
# Manual training
cargo run --release --bin train_llama -- \
--data data/TinyStories \
--dim 256 \
--layers 8 \
--steps 10000 \
--lr 3e-4
# Using launch script (Windows)
./launch_chat.bat
# Manual inference
cargo run --release --bin bit_llama -- \
--model models/my_model \
--prompt "Hello Bit-TTT!" \
--max-tokens 100 \
--temp 0.8
| Document | Description |
|---|---|
| ARCHITECTURE.md | System design philosophy |
| ROADMAP.md | Future development plans |
| docs/SPECIFICATION.md | Technical specification |
| docs/CONTRIBUTING.md | Contribution guide |
# Run all tests
cargo test --workspace
# Check compilation
cargo check --workspace
# Format code
cargo fmt --all
# Run linter
cargo clippy --workspace
cd crates/rust_engine
pip install maturin
maturin develop --release
import cortex_rust
config = cortex_rust.BitLlamaConfig(
vocab_size=16384,
hidden_dim=256,
num_layers=8,
inner_lr=0.1
)
model = cortex_rust.BitLlama(config, "model.safetensors", device="cuda")
logits = model.forward(token_id=42)
Solana Wallet: 13ui3nmE7smmK3Pk8wyKb7RE6wHyMJCcWgCeMRRdoory
Created by Project Bit-TTT • MIT License