module
context.logits
Functions
M.logits
Raw logits buffer for batch position `idx`. Returns a `float*` cdata that is OWNED by the context — do not free, do not retain past the next decode.
M.sampled_token
Sampled token at batch position `i`, or `-1` if not available. (Set by an attached `llama_set_sampler` ; see `Context:set_sampler`.)
M.sampled_probs
Probability array for the sampled token at position `i`.
M.sampled_probs_count
Count matching `sampled_probs(i)`.
M.sampled_logits
Logit array for the sampled candidates at position `i`.
M.sampled_logits_count
Count matching `sampled_logits(i)`.
M.sampled_candidates
Candidate-token array at position `i`.
M.sampled_candidates_count
Count matching `sampled_candidates(i)`.
prepare_softmax
Internal : numerically-stable log-softmax preparation. Returns `(logits, n, max_l, sum)` for reuse by the public methods. When the batch has no logits at `idx`, `logits` is nil.
M.logprob
Log-probability of `token_id` given the logits at batch position `idx`. Returns `-math.huge` when the token is out of vocab range or the batch slot has no logits.
M.entropy
Shannon entropy (in nats) of the logit distribution at batch position `idx`. Returns `0` when no logits are available.
M.logprob_entropy
Combined `logprob` and `entropy` in ONE pass over `n_vocab`. Use when both metrics are needed — halves the work compared to two separate calls.
M.embedding_ptr
Raw embedding cdata pointer for a sequence (zero-copy). Falls back to the per-batch embedding when the sequence-keyed accessor returns nil. Caller must NOT free and must NOT retain past the next decode.
M.embedding
Pooled embedding as a Lua table (1-based, copies the floats out so the caller can hold it past the next decode).
M.set_control_vector
Apply a control vector (activation steering) to this context. Effective on the next decode. Pass either a Lua table (we'll copy to a float buffer) or a pre-built cdata `float*`.
M.clear_control_vector
Remove any previously-applied control vector.
M.set_sampler
Attach a sampler to a specific sequence. After this, `llama_decode` will sample automatically using `sampler` whenever it processes `seq_id`. EXPERIMENTAL upstream — the surface may shift.
M.attach_threadpool
Attach a CPU threadpool to this context. `tp` may be either a raw `ggml_threadpool_t` cdata or a `Threadpool` instance. The second pool slot is for batch operations and defaults to NULL, which tells llama.cpp to fall back to the primary pool internally — passing the same pool twice via two distinct attach calls would corrupt the pool's wait state.
M.detach_threadpool
Detach the current threadpool (reverts to llama.cpp's internal one).
M.perf_print
Print perf counters to stderr.
M.perf_reset
Reset every perf counter to zero.
M.perf
Read the perf counters into a Lua table.
M.warmup
Run a single dummy decode through this context to force the GPU backend to JIT-compile its shaders. After the call the KV cache is wiped and `n_past` reset, so the context is left in the same state it would have been right after creation — minus the cold-shader penalty on the next real decode. Use once, right after `model:context()` and before any real work, to shave 1-3 s off the time-to-first-token of the first real request.