ion7-labs

API Overview

LuaJIT × llama.cpp — modular local LLM runtime.
Each module is independent and usable standalone or as part of the full stack.

484 documented functions

3 modules available

6 planned

Available — API reference

ion7-core stable v1.2.0

FFIllama.cppzero-malloc

LuaJIT FFI → libllama.so. 84 bridge functions across 4 translation units: model, context, KV cache, speculative decoding, chat templates (Jinja2), sampling, LoRA, reasoning budget, grammar constraints.

Zero malloc per generated token. KV snapshot/restore. Prefix cache. Full libcommon surface (DRY, XTC, EAGLE3, NGRAM_CACHE).

API Reference →

ion7-grammar beta v0.2

GBNFABNFEBNFJSON SchemaLPeg

Grammar engine for LuaJIT. Eight input formats (regex / ABNF / EBNF / JSON Schema / type DSL / enum / tool / auto-detect) all yielding the same composable Grammar_obj.

AST + LPeg-backed parsers. Per-seq Backtrack and DCCD (multi-tenant safe). GrammarContext for stateful SQL agents. Pure-Lua fuzzer. Composition algebra (union, sequence, wrap, interleave).

API Reference →

ion7-llm beta v0.2

chatmulti-sessionRadixAttentionstreaming

Chat pipeline + multi-session inference orchestration. Per-seq KV snapshots, prefix cache, slot pool, fork. Engine + Pool (~6× aggregate speedup).

Mid-generation eviction, RadixAttention exact-match prefix cache, Y-Token sink hook. 4-channel streaming (content/thinking/tool_call_delta/tool_call_done/stop). Format-aware tool extraction (OpenAI/Qwen/Mistral/Hermes). Interleaved-thinking tool loop. Reasoning budget. Embeddings.

API Reference →

In Development — docs coming

ion7-engram research

SAEsuperpositionembeddings

Sparse Autoencoder on LLM embeddings. Validates superposition hypothesis: 0.91 cosine reconstruction, 0.500 Jaccard between related concept clusters.

SAE:edit() for primitive surgery (zero/set/scale). 64 primitives, K=16 active. Adam sparse, LuaJIT + OpenBLAS FFI. x16 embedding compression.

ion7-flow PoC

visualReact FlowBun

Visual node editor for ion7 pipelines. React Flow + Bun WebSocket server + LuaJIT executor. Each ion7-core function is a wireable node.

Browser ↔ Bun WS ↔ LuaJIT. Topological execution. Nodes: Model_load, Ctx_decode, Sampler_chain, Generate, Display.

ion7-nvim plugin

Neovimstreaming

Neovim plugin for in-editor LLM generation. Subprocess-based streaming via jobstart(). Supports multi-turn via --msgs-file.

Protocol: TOKEN: / DONE: / ERROR: over stdout. No HTTP dependency.

Planned

ion7-embed planned

embeddingscosinepooling

Local embeddings without llama-server. Load Qwen3-Embedding-8B directly via ion7-core FFI — no HTTP, no subprocess.

ion7-memory planned

memory3-layerRAG

3-layer persistent memory: hot index (always in context) + topics (on-demand) + session archives (grep only).

ion7-rag planned

SQLitevssretrieval

Retrieval-Augmented Generation pipeline. SQLite + sqlite-vss vector store. Query → embed → cosine search → context injection.

ion7-tts planned

TTSKokorostreaming

Local text-to-speech via Kokoro-82M FFI. Streaming token→audio pipeline for <250ms first-sound latency in NPC AI pipelines.

ion7-stt planned

STTWhisperstreaming

Local speech-to-text via Whisper FFI. Streaming voice input with <50ms segment latency.

ion7-train planned

LoRAQLoRAGGML autograd

Fine-tuning and distillation via GGML autograd. LoRA/QLoRA on RTX 3060. Teacher→student distillation. No Python.

Stack layers

Application

Nyx companion NPC AI ion7-nvim ion7-flow

High-level

ion7-grammar ion7-llm ion7-memory ion7-rag ion7-tts ion7-stt ion7-train

Mid-level

ion7-embed ion7-kv

Core

ion7-core ↓ libllama.so + libcommon.a ↓ libggml.so (CUDA / CPU)