Pure Rust LLM inference engine with Soul learning and hierarchical memory.
Status: v1.0.0 — Development Complete. This project is fully functional and no longer under active development.
A local LLM inference engine written entirely in Rust. It runs GGUF and safetensors models on your PC, with a unique Soul system that lets the AI learn and remember across conversations.
Key features:
# Homebrew (macOS / Linux)
brew tap imonoonoko/bitllama && brew install bitllama
# Windows (winget)
winget install imonoonoko.BitLlama
# Or download from GitHub Releases
bitllama pull bartowski/gemma-2-2b-it-GGUF
bitllama run ~/.bitllama/models/gemma-2-2b-it-Q4_K_M.gguf
bitllama learn "My name is Onoko" --model model.gguf --save onoko.soul
bitllama run model.gguf --soul onoko.soul
bitllama serve model.gguf --port 8000
# OpenAI-compatible: POST /v1/chat/completions
BitLlama Desktop — built with Tauri 2.0 + Svelte 5.
# Install
winget install imonoonoko.BitLlamaDesktop
# Or build from source
cd bitllama-desktop && npm install && npx tauri build
| Model | Format | Chat Template |
|---|---|---|
| Llama-2 7B/13B | GGUF | llama2 |
| Llama-3 8B | GGUF | llama3 |
| Gemma-2 2B/9B | GGUF | gemma |
| Gemma-3 | GGUF | gemma |
| Qwen2.5 0.5B-7B | GGUF | chatml |
| Mistral 7B | GGUF | mistral |
| BitNet 2B4T | safetensors | bitnet |
GGUF quantizations: Q4_K_M, Q6_K, Q8_0, F16.
RTX 4060 Ti 8GB, Q4_K_M:
| Model | Speed | vs llama.cpp |
|---|---|---|
| Llama-2 7B | 45.4 tok/s | 90% |
| Mistral 7B | 42.1 tok/s | 89% |
| Gemma-2 2B | 75.1 tok/s | 74% |
Bit-TTT-Engine/
├── crates/
│ ├── bit_llama/ # CLI application
│ ├── rust_engine/ # Core inference engine (GGUF, CUDA, LoRA, KV Cache)
│ └── bit_converter/ # Model conversion utilities
├── bitllama-desktop/ # Desktop GUI (Tauri 2.0 + Svelte 5)
└── docs/ # Documentation
Conversations → Episodes (L0) → Sleep → Facts (L1) → Concepts (L2) → Worldview (L3)
↓
Soul Promotion (LoRA fine-tuning from stable patterns)
# CLI
cargo build --release -p bit_llama
# With CUDA
cargo build --release -p bit_llama --features cuda
# Desktop
cd bitllama-desktop && npm install && npx tauri build
# Tests
cargo test --no-default-features --lib
Requirements: Rust 1.75+, CUDA 12.x (optional)
This project was developed over 3 weeks (Jan-Feb 2026) as a solo effort. Final stats:
MIT License — see LICENSE.
Built with Rust by @imonoonoko