ion7-labs — LuaJIT × llama.cpp

Python

malloc / token

http overhead

484

documented functions

zero malloc / token

llama_batch pre-allocated at context creation. KV cache managed, never reallocated. Every generated token is a pure compute step.

direct FFI

No subprocess, no HTTP, no JSON serialization. LuaJIT ffi.call() straight into libllama.so — call overhead in microseconds.

full llama.cpp surface

84 bridge functions across 4 translation units. Chat templates (Jinja2), LoRA, speculative decoding, grammar, reasoning budget — all exposed.

grammar engine

GBNF, JSON Schema, regex, tool calling in pure Lua. CRANE-style lazy grammar activation. Constrained generation without sacrificing reasoning.

Quick start

local Model   = require "ion7.core.model"
local Sampler = require "ion7.core.sampler"

local model = Model.load("model.gguf", { n_gpu_layers = -1 })
local ctx   = model:context({ n_ctx = 4096 })
local vocab = model:vocab()
local samp  = Sampler.chain(vocab):temp(0.8):top_p(0.95):build()

local tokens, n = vocab:tokenize("Hello, world!", true)
ctx:decode(tokens, n)

repeat
    local token = samp:sample(ctx, -1)
    samp:accept(token)
    io.write(vocab:piece(token))
until vocab:is_eog(token)

Stack

ion7-core stable v1.1

LuaJIT FFI → llama.cpp. Zero malloc per token. 84 bridge functions, 4 translation units.

API Reference →

ion7-grammar beta v0.2

Grammar engine for LuaJIT. Compiles regex, ABNF, EBNF, JSON Schema, Lua type annotations to GBNF. Per-seq Backtrack + DCCD runtime, pure-Lua fuzzer, format auto-detect.

API Reference →

ion7-llm beta v0.2

Chat pipeline + multi-session inference. Per-seq KV snapshots, prefix cache, three-channel streaming, schema-constrained sampling, interleaved-thinking tool loop.

API Reference →

ion7-engram research

Sparse Autoencoder on LLM embeddings. Superposition hypothesis, 0.91 cosine reconstruction.

ion7-flow PoC

Visual node editor. React Flow + Bun WebSocket + LuaJIT executor.

ion7-nvim plugin

Neovim integration. Subprocess-based streaming token generation.

ion7-embed planned

Local embeddings without llama-server. Cosine similarity, pooling, batch encoding.

ion7-memory planned

3-layer persistent memory: hot index + topics + session archives.

ion7-rag planned

SQLite + vector search. Query → embed → retrieve pipeline.

LuaJIT× llama.cpp

LuaJIT
× llama.cpp