class

ion7.llm.Engine

_ctx ion7.core.Context

_vocab ion7.core.Vocab

_cm ion7.llm.kv.ContextManager

_default_sampler ion7.core.Sampler?

_opts table

_tok_cdata cdata Pre-allocated `int32_t[1]` for per-token decode.

Functions

Engine.new Engine:chat Engine:stream Engine:complete

Engine.new

Build an engine.

Engine.new(ctx, vocab, cm, opts)

ctxion7.core.Context

vocabion7.core.Vocab

cmion7.llm.kv.ContextManager

optstable?

→ ion7.llm.Engine

Engine:chat

Synchronous chat. Decodes the session, samples until a stop condition, returns a fully-parsed Response.

Engine:chat(session, opts)

sessionion7.llm.Session

optstable?

→ ion7.llm.Response

Engine:stream

Streaming chat. Returns an iterator that yields typed chunks : { kind = "content", text = "..." } { kind = "thinking", text = "..." } { kind = "tool_call_delta", call_id, name, args_partial } { kind = "tool_call_done", call_id, call } { kind = "stop", reason = "stop" | "length" | "stop_string" | "tool_use" } The iterator emits exactly one final `stop` chunk after the model halts. Tool-call chunks fire AS SOON AS the open marker is detected in the content stream, with `tool_call_delta` updates as the arguments JSON accumulates and a `tool_call_done` once the close marker (or balanced JSON braces) closes the call.

Engine:stream(session, opts)

sessionion7.llm.Session

optstable?Same as `:chat`.

→ functionCoroutine iterator yielding chunks.

Engine:complete

One-shot completion : create an ephemeral session, chat, return the Response. The session is discarded — no history is preserved.

Engine:complete(prompt, opts)

promptstring

optstable?`system` (string?) plus any `:chat` option.

→ ion7.llm.Response