Bit-TTT-Engine

Bit-TTT Engine Architecture

1. Core Philosophy

Bit-TTT aims to bridge the gap between “Ultra-efficient Inference” and “Adaptive Learning”. We combine two technologies into a single, portable runtime:

  1. 1.58-bit Quantization (BitNet b1.58): Ternary parameters {-1, 0, 1} for extreme efficiency.
  2. Test-Time Training (TTT): On-the-fly context learning using “Fast Weights” instead of static KV-cache.

2. System Overview

The project follows a Rust-First, Python-Compatible architecture.

graph TD
    A["Python (PyO3)"] -->|Direct Bindings| B["Rust Core Engine"]
    B -->|Candle (SIMD/AVX)| C["CPU / GPU"]
    
    subgraph Rust Core
    D["BitLlama (Model)"]
    E["TTT Layer (Fast Weights)"]
    F["BitLinear (Ternary Weights)"]
    end
    
    B --> D
    D --> E
    D --> F

Component Details

Component Details

Module Role Tech Stack
crates/core_engine Neural Network Logic Candle tensor framework. Supports CPU/CUDA.
crates/cortex_rust Python Interface PyO3. Exposes BitLlama class directly to Python.
legacy Deprecated Interop Old extern "C" / ndarray implementation (isolated).

3. Data Flow (Inference)

Standard TTT Forward

  1. Input: Token IDs from Python.
  2. Zero-Copy: Data passed to Rust without copying via PyO3 buffer protocol.
  3. Forward Pass:
    • Embedding: Lookup.
    • TTT Update: W_state updated via Gradient Descent (online learning).
    • Projection: 1.58-bit matrix multiplication.
  4. Output: Logits returned to Python as Tensor.

4. Safety & Build Options