ion7-core / utf8

module

utf8

Functions

M.seq_len

Return the expected total byte length of the UTF-8 sequence starting with `b`.

M.seq_len(b)
bintegerByte value (0-255) of the leading byte.
→ integer1, 2, 3 or 4 for valid leading bytes; 0 for

M.is_complete

Return `true` if `buf` ends on a complete UTF-8 character boundary, `false` otherwise. Empty buffers count as complete. This is the streaming-friendly check : it walks the buffer leading byte by leading byte, advancing by the expected sequence length each step. As soon as a sequence is malformed (zero-length leader) or truncated (would advance past `#buf`), we report incomplete and stop. The implementation does NOT validate continuation byte structure (i.e. it trusts that a valid leader is followed by `seq-1` bytes with `10xx_xxxx`); this matches the historical bridge behaviour and is sufficient for splitting llama.cpp's tokenised output.

M.is_complete(buf)
bufstring|nilByte buffer (typically a partial token stream).
→ boolean