ion7-llm / engine

class

ion7.llm.Engine

_ctx ion7.core.Context
_vocab ion7.core.Vocab
_cm ion7.llm.kv.ContextManager
_default_sampler ion7.core.Sampler?
_opts table
_tok_cdata cdata Pre-allocated `int32_t[1]` for per-token decode.

Functions

Engine.new

Build an engine.

Engine.new(ctx, vocab, cm, opts)
ctxion7.core.Context
vocabion7.core.Vocab
cmion7.llm.kv.ContextManager
optstable?
→ ion7.llm.Engine

Engine:chat

Synchronous chat. Decodes the session, samples until a stop condition, returns a fully-parsed Response.

Engine:chat(session, opts)
sessionion7.llm.Session
optstable?
→ ion7.llm.Response

Engine:stream

Streaming chat. Returns an iterator that yields typed chunks : { kind = "content", text = "..." } { kind = "thinking", text = "..." } { kind = "tool_call_delta", call_id, name, args_partial } { kind = "tool_call_done", call_id, call } { kind = "stop", reason = "stop" | "length" | "stop_string" | "tool_use" } The iterator emits exactly one final `stop` chunk after the model halts. Tool-call chunks fire AS SOON AS the open marker is detected in the content stream, with `tool_call_delta` updates as the arguments JSON accumulates and a `tool_call_done` once the close marker (or balanced JSON braces) closes the call.

Engine:stream(session, opts)
sessionion7.llm.Session
optstable?Same as `:chat`.
→ functionCoroutine iterator yielding chunks.

Engine:complete

One-shot completion : create an ephemeral session, chat, return the Response. The session is discarded — no history is preserved.

Engine:complete(prompt, opts)
promptstring
optstable?`system` (string?) plus any `:chat` option.
→ ion7.llm.Response