class
ion7.core.Context
Functions
Context.new
Wrap a raw `llama_context*` returned by `llama_init_from_model`. Prefer `model:context()` over calling this directly — it does the params dance and the OOM-retry cascade for you.
Context:ptr
Return the raw `llama_context*` cdata pointer (used by samplers and any FFI call that needs the context handle).
Context:memory
Return the cached `llama_memory_t` accessor for this context. Re-calling `llama_get_memory` on every KV op would be wasteful — we cache it once at construction.
Context:free
Explicitly free the context (and its batch buffers) immediately. Idempotent. Normally the GC handles this ; call it manually inside tight benchmark loops to avoid accumulating dead VRAM allocations between iterations.
Context:n_ctx
Context:n_batch
Context:n_ubatch
Context:n_seq_max
Context:n_ctx_seq
Context:n_threads
Context:n_threads_batch
Context:set_n_threads
Update the thread counts on a live context — no recreate required.
Context:pooling_type
Symbolic pooling strategy of the context, e.g. `"mean"` for an embedding context. See `POOLING_NAMES` for the full mapping.
Context:set_embeddings
Toggle embedding extraction mode at runtime.
Context:set_causal_attn
Toggle causal attention. Pass `false` to use bidirectional attention (the embedding mode used by encoder-style models).
Context:set_warmup
Mark the context as "in warmup" so llama.cpp does not pollute its perf counters with the dummy decode shaders are JIT-compiling on. See `Context:warmup()` for the high-level helper.
Context:synchronize
Block until every async GPU command queued so far has finished. Useful before reading logits or tensors out of a backend buffer.
Context:set_abort_callback
Register an abort callback that llama.cpp will poll periodically during a decode. Returning `true` from the callback aborts.
Context:n_past
Context:set_n_past
Manually realign the Lua-side `n_past` mirror after a state restore (when llama.cpp resumes from a snapshot it knows the position but we don't). Most callers should NOT need this.