Bit-TTT-Engine

BitLlama

License: MIT Rust Build Status

Pure Rust LLM inference engine with Soul learning and hierarchical memory.

Status: v1.0.0 — Development Complete. This project is fully functional and no longer under active development.


What is BitLlama?

A local LLM inference engine written entirely in Rust. It runs GGUF and safetensors models on your PC, with a unique Soul system that lets the AI learn and remember across conversations.

Key features:


Quick Start

Install

# Homebrew (macOS / Linux)
brew tap imonoonoko/bitllama && brew install bitllama

# Windows (winget)
winget install imonoonoko.BitLlama

# Or download from GitHub Releases

Run

bitllama pull bartowski/gemma-2-2b-it-GGUF
bitllama run ~/.bitllama/models/gemma-2-2b-it-Q4_K_M.gguf

Teach

bitllama learn "My name is Onoko" --model model.gguf --save onoko.soul
bitllama run model.gguf --soul onoko.soul

API Server

bitllama serve model.gguf --port 8000
# OpenAI-compatible: POST /v1/chat/completions

Desktop GUI

BitLlama Desktop — built with Tauri 2.0 + Svelte 5.

# Install
winget install imonoonoko.BitLlamaDesktop

# Or build from source
cd bitllama-desktop && npm install && npx tauri build

Supported Models

Model Format Chat Template
Llama-2 7B/13B GGUF llama2
Llama-3 8B GGUF llama3
Gemma-2 2B/9B GGUF gemma
Gemma-3 GGUF gemma
Qwen2.5 0.5B-7B GGUF chatml
Mistral 7B GGUF mistral
BitNet 2B4T safetensors bitnet

GGUF quantizations: Q4_K_M, Q6_K, Q8_0, F16.


Performance

RTX 4060 Ti 8GB, Q4_K_M:

Model Speed vs llama.cpp
Llama-2 7B 45.4 tok/s 90%
Mistral 7B 42.1 tok/s 89%
Gemma-2 2B 75.1 tok/s 74%

Architecture

Bit-TTT-Engine/
├── crates/
│   ├── bit_llama/        # CLI application
│   ├── rust_engine/      # Core inference engine (GGUF, CUDA, LoRA, KV Cache)
│   └── bit_converter/    # Model conversion utilities
├── bitllama-desktop/     # Desktop GUI (Tauri 2.0 + Svelte 5)
└── docs/                 # Documentation

Soul & Memory Architecture

Conversations → Episodes (L0) → Sleep → Facts (L1) → Concepts (L2) → Worldview (L3)
                                  ↓
                            Soul Promotion (LoRA fine-tuning from stable patterns)

Build from Source

# CLI
cargo build --release -p bit_llama

# With CUDA
cargo build --release -p bit_llama --features cuda

# Desktop
cd bitllama-desktop && npm install && npx tauri build

# Tests
cargo test --no-default-features --lib

Requirements: Rust 1.75+, CUDA 12.x (optional)


What Was Built

This project was developed over 3 weeks (Jan-Feb 2026) as a solo effort. Final stats:


Acknowledgments

License

MIT License — see LICENSE.


Built with Rust by @imonoonoko