module
context.decode
Functions
M.decode
Decode `n_tokens` tokens, updating the KV cache and producing logits for the LAST token only. The function chunks automatically when `n_tokens > n_batch` so the caller can pass an arbitrarily long prompt. The returned value is the size of the LAST chunk — that's the upper bound for `idx` arguments to `Sampler:sample(ctx:ptr(), idx)` afterwards.
raises — On KV-full or any other `llama_decode` error.
M.decode_single
Decode ONE token. Logits enabled for it. Skips the chunking loop and the table-vs-cdata branch — the JIT loves this path.
raises — On any `llama_decode` error.
M.decode_multi
Decode `#tokens` tokens with logits enabled for EVERY position. After this call, `Sampler:sample(ctx:ptr(), i)` is valid for `i` in `0 .. #tokens - 1`. Used by speculative decoding to verify all draft positions in a single forward pass.
raises — When `#tokens` exceeds the batch capacity, or on decode error.
M.encode
Encode `n_tokens` tokens through the model's encoder stack. Used by encoder-decoder architectures (T5, BART, ...). Like `decode`, handles chunking and both table / cdata token layouts.
raises — On any `llama_encode` error.