module

context.decode

Functions

M.decode M.decode_single M.decode_multi M.encode

M.decode

Decode `n_tokens` tokens, updating the KV cache and producing logits for the LAST token only. The function chunks automatically when `n_tokens > n_batch` so the caller can pass an arbitrarily long prompt. The returned value is the size of the LAST chunk — that's the upper bound for `idx` arguments to `Sampler:sample(ctx:ptr(), idx)` afterwards.

M.decode(self, tokens, n_tokens, seq_id, pos_offset)

tokenscdata|tableToken IDs. Lua tables are 1-based ;

n_tokensinteger?Required for cdata, auto for tables.

seq_idinteger?Sequence ID (default 0).

pos_offsetinteger?Starting KV position (default `n_past`).

→ integerLast chunk size.

raises — On KV-full or any other `llama_decode` error.

M.decode_single

Decode ONE token. Logits enabled for it. Skips the chunking loop and the table-vs-cdata branch — the JIT loves this path.

M.decode_single(self, token, seq_id)

tokenintegerToken ID to decode.

seq_idinteger?Sequence ID (default 0).

raises — On any `llama_decode` error.

M.decode_multi

Decode `#tokens` tokens with logits enabled for EVERY position. After this call, `Sampler:sample(ctx:ptr(), i)` is valid for `i` in `0 .. #tokens - 1`. Used by speculative decoding to verify all draft positions in a single forward pass.

M.decode_multi(self, tokens, seq_id)

tokenstable1-based Lua table of token IDs.

seq_idinteger?Sequence ID (default 0).

raises — When `#tokens` exceeds the batch capacity, or on decode error.

M.encode

Encode `n_tokens` tokens through the model's encoder stack. Used by encoder-decoder architectures (T5, BART, ...). Like `decode`, handles chunking and both table / cdata token layouts.

M.encode(self, tokens, n_tokens, seq_id, pos_offset)

tokenscdata|table

n_tokensinteger?

seq_idinteger?

pos_offsetinteger?

raises — On any `llama_encode` error.