ion7-core / speculative

class

ion7.core.Speculative

_ptr cdata `ion7_speculative_t*` (auto-freed via ffi.gc).
_n_draft integer Per-step max draft tokens.
_draft_buf cdata Reusable buffer for `:draft` output.
_ctx_buf cdata Reusable token-history buffer (grows on demand).
_ctx_buf_sz integer Current capacity of `_ctx_buf` in tokens.

Functions

Speculative.new

Create a speculative-decoding engine.

Speculative.new(ctx_tgt, ctx_dft, opts)
ctx_tgtcdata|ion7.core.ContextTarget context (mandatory).
ctx_dftcdata|ion7.core.Context|nilDraft context, only for `DRAFT`.
optstable?
→ ion7.core.Speculative

raises — When `ion7_speculative_init` returns NULL.

Speculative:begin

(Re)initialise the engine with the current prompt history. Pass an empty table on a fresh conversation — the cache will warm up on subsequent `draft()` calls.

Speculative:begin(tokens)
tokensinteger[]1-based Lua array of token ids.

Speculative:draft

Generate up to `n_draft` predicted tokens for the next step.

Speculative:draft(tokens, last_tok)
tokensinteger[]All generated tokens so far (1-based).
last_tokintegerThe most recently committed token id.
→ integer[]1-based array of draft tokens (possibly empty).

Speculative:accept

Tell the engine how many consecutive draft tokens the target model accepted. Call after every speculative step, even when `0`.

Speculative:accept(n_accepted)
n_acceptedinteger

Speculative:stats

Print acceptance-rate / effective-speedup stats to stderr.

Speculative:stats()

Speculative:free

Explicit release. Idempotent. Disarms the GC finalizer first to avoid a double-free if the GC runs later.

Speculative:free()