ricelang¶

Language identification, tokenization, and Zawgyi/Unicode conversion for 50+ Southeast and South Asian languages.

Revamp of pyidaungsu — ISO 639-3 labels, 17× more languages, reproducible training, uv-native.

At a glance¶

50+ languages: 25 trained labels + 27 script-rule freebies
99.85% P@1 on held-out validation
1.8 MB detection model, ~13 MB full bundle
Local & offline: no API key, no GPU, no network
Returns None for out-of-scope text — no hallucinated answers
ricelang CLI ships with the Python package

Install¶

pipuv

pip install ricelang

uv add ricelang

One-minute taste¶

import ricelang as rl

rl.detect("ထမင်းစားပြီးပြီလား")       # 'mya'   (Burmese)
rl.detect("안녕하세요")                # 'kor'   (Korean — via script rule)
rl.detect("🎉")                       #  None    (out of scope)

rl.tokenize("ဖေဖေနဲ့မေမေ၏ကျေးဇူး", form="word")
# ['ဖေဖေ', 'နဲ့', 'မေမေ', '၏', 'ကျေးဇူး']

rl.cvt2zg("ထမင်းစားပြီးပြီလား")        # → Zawgyi
rl.cvt2uni("ထမင္းစားၿပီးၿပီလား")        # → Unicode

…or from the shell:

ricelang detect "ထမင်းစားပြီးပြီလား"      # mya
echo "hello" | ricelang detect -          # None  (no rule for English-only short text)

When to use ricelang¶

ricelang is the right tool when you have text from Southeast or South Asia and need:

Language identification for routing/filtering pipelines
Tokenization that respects script and morphology (Burmese syllables, BPE subwords for downstream embeddings)
Zawgyi → Unicode normalization of legacy Burmese text
A small, deterministic, offline building block that's not an LLM

When not to use ricelang¶

Generic worldwide language detection — ricelang is specialised for SE/South Asia + East Asian script families. Use fasttext lid.176 or cld3 as an upstream router if you process arbitrary global text.
Anything requiring reasoning — translation, summarization, structured extraction. That's LLM territory.

Next steps¶

Install — pip / uv / from source
CLI — every subcommand with examples
Detection guide — Python API
How the detector works — script rule + ML hybrid
Supported languages — full label table
Migrating from pyidaungsu