ion7-core / vocab

class

ion7.core.Vocab

_ptr cdata `const llama_vocab*` (not owned).
_model_ref table Parent Model — keeps the model alive.
_tmpls cdata? `common_chat_templates*` (owned, GC-freed).
_piece_buf cdata Pre-allocated 256-byte buffer for `piece()`.
_piece_big cdata Pre-allocated 4-KB fallback buffer.
_piece_cache table Memoised `[token_id] = piece_string` cache.
_tmpl_buf cdata Pre-allocated buffer for chat templates.
_dtok_buf cdata Pre-allocated buffer for detokenisation.

Functions

Vocab.new

Wrap a raw `llama_vocab*` returned by `llama_model_get_vocab`. The caller MUST keep the parent Model alive at least as long as the Vocab — we anchor it as `self._model_ref` for that very purpose. The chat-templates handle is initialised eagerly. If the bridge shared library is missing, the `require` of `ion7.core.ffi.bridge` would have failed at module load already — but we degrade gracefully here by checking the call result, leaving `_tmpls = nil` when initialisation fails (e.g. for a model with no embedded template). Calls to `apply_template` will then raise a clear error.

Vocab.new(model, ptr)
modelion7.core.ModelParent model.
ptrcdata`const llama_vocab*`.
→ ion7.core.Vocab

Vocab:n_vocab

Vocab:n_vocab()
→ integerTotal number of tokens in the vocabulary.

Vocab:n_tokens

Backwards-compatible alias for `n_vocab`.

Vocab:n_tokens()

Vocab:type

Vocab:type()
→ stringVocab kind : `"spm"`, `"bpe"`, `"wpm"`, `"ugm"`,

Vocab:tokenize

Tokenise UTF-8 `text` into an int32 cdata array. `llama_tokenize` returns the negated required size when our buffer is too small ; we honour that contract by reallocating once and retrying. The initial heuristic (1 token per byte + 16 specials) is a generous overestimate that almost always avoids the retry.

Vocab:tokenize(text, add_special, parse_special)
textstringUTF-8 input.
add_specialboolean?Add BOS/EOS tokens (default false).
parse_specialboolean?Parse `<...>` special tokens (default true).
→ cdata`int32_t[?]` token array (0-based).
→ integerToken count.

raises — When tokenisation fails after the retry.

Vocab:detokenize

Detokenise back into a UTF-8 Lua string. Uses the pre-allocated `_dtok_buf` (64 KB). For exceptionally long outputs we retry once with a heap-allocated buffer sized to the exact requested capacity.

Vocab:detokenize(tokens, n, remove_special, unparse_special)
tokenscdata`int32_t[?]` token array (0-based).
nintegerToken count.
remove_specialboolean?Strip BOS/EOS (default false).
unparse_specialboolean?Convert specials back to text (default false).
→ string

Vocab:piece

Convert a single token to its visible text piece. Memoised : the second call with the same token id returns a cached Lua string.

Vocab:piece(token, special)
tokenintegerToken id.
specialboolean?Render special-token text too (default true).
→ string

Vocab:text

Raw text representation of a token (no lstrip / SPM whitespace normalisation). Empty string if the token has no text view.

Vocab:text(token)
tokeninteger
→ string

Vocab:score

Float score of a token (only meaningful for SPM/Unigram vocabs).

Vocab:score(token)
tokeninteger
→ number

Vocab:attr

Attribute bitmask for a token (`LLAMA_TOKEN_ATTR_*`).

Vocab:attr(token)
tokeninteger
→ integer

Vocab:is_eog

True for end-of-generation tokens (EOS, EOT, custom STOP, ...).

Vocab:is_eog(token)
tokeninteger
→ boolean

Vocab:is_control

True for control / special tokens.

Vocab:is_control(token)
tokeninteger
→ boolean

Vocab:bos

Beginning-of-sequence token id, or `-1` if the model has none.

Vocab:bos()

Vocab:eos

End-of-sequence token id, or `-1` if absent.

Vocab:eos()

Vocab:eot

End-of-turn token id (chat models).

Vocab:eot()

Vocab:nl

Newline token id, or `-1`.

Vocab:nl()

Vocab:pad

Padding token id.

Vocab:pad()

Vocab:sep

Sentence-separator token id.

Vocab:sep()

Vocab:mask

Mask token id (for masked-LM models).

Vocab:mask()

Vocab:fim_pre

Fill-in-the-Middle prefix token id. `-1` when the model has no FIM.

Vocab:fim_pre()

Vocab:fim_suf

FIM suffix token id.

Vocab:fim_suf()

Vocab:fim_mid

FIM middle token id.

Vocab:fim_mid()

Vocab:fim_pad

FIM padding token id.

Vocab:fim_pad()

Vocab:fim_rep

FIM repository-marker token id.

Vocab:fim_rep()

Vocab:fim_sep

FIM separator token id.

Vocab:fim_sep()

Vocab:get_add_bos

True if the vocab automatically prepends BOS during tokenisation.

Vocab:get_add_bos()

Vocab:get_add_eos

True if the vocab automatically appends EOS during tokenisation.

Vocab:get_add_eos()

Vocab:get_add_sep

True if the vocab automatically appends a sentence separator.

Vocab:get_add_sep()

Vocab:builtin_templates

Names of the chat templates llama.cpp ships with (separate from the per-model template stored inside the GGUF). Useful for debugging / introspection ; production code should generally rely on `apply_template` which uses the model's own template.

Vocab:builtin_templates()
→ string[]Template name strings.

Vocab:apply_template

Apply the model's embedded Jinja2 chat template to a sequence of messages. Returns the formatted prompt ready to feed to `Vocab:tokenize`. The return string excludes the trailing NUL byte (that's what `needed - 1` is computing — `ion7_chat_templates_apply` reports the byte count INCLUDING NUL).

Vocab:apply_template(messages, add_ass, enable_thinking)
messagestable[]Array of `{ role = string, content = string }`.
add_assboolean?Append assistant generation prefix (default true).
enable_thinkinginteger?`-1` model default, `0` off, `1` on.
→ stringFormatted prompt.

raises — When the template engine fails (missing `_tmpls` or runtime error).

Vocab:supports_thinking

True if the embedded template recognises `enable_thinking` (Qwen3, DeepSeek-R1 et al.).

Vocab:supports_thinking()
→ boolean