— The Pipeline
Five stages.
One class of error each.
CORAL is not a model. It is a chain of deterministic algorithms with a single LLM stage at the end — each engineered to neutralise a specific failure mode in Urdu ASR.
Urdu Normalisation
Zero-risk Unicode unification
Example
What it does
Stage 0 collapses Arabic-form code-points onto canonical Urdu, strips combining diacritics, harmonises hamza placement, removes tatweel, and normalises punctuation.
Why it matters
A substantial fraction of apparent ASR errors aren't recognition failures — they are measurement artefacts caused by Arabic↔Urdu Unicode variants on either side of the comparison.
Mechanism
- Arabic Yeh ي → Urdu Yeh ی
- Arabic Kaf ك → Urdu Kaf ک
- Strip ZWJ + Shadda + Tatweel
- Normalise whitespace + punct
Split-Merge Alignment
Word boundaries as first-class data
Example
What it does
Stage 1 performs source-anchored weighted Levenshtein alignment across every companion model. Each chunk is tagged SAME, SPLIT (1→n), MERGE (n→1) or NOISE.
Why it matters
SPLIT and MERGE together are 36.5% of all inter-model events. ROVER-style fusion misreads them as substitutions and corrupts the voting signal downstream.
Mechanism
- Source-anchored Levenshtein
- Per-chunk: SAME/SPLIT/MERGE/NOISE
- Info tags: MATCH/INS/DEL/SUB
- Raw + normalised attempts preserved
OOV + BK-tree Lookup
Hybrid out-of-vocabulary correction
Example
What it does
Stage 2 detects tokens absent from the curated Urdu lexicon, queries a BK-tree for edit-distance neighbours, and re-ranks candidates by an n-gram language model conditioned on local context.
Why it matters
Pure edit-distance over-corrects (every misspelling looks like a typo); pure n-gram is under-determined. The combination is robust to dialect, named entities, and code-switched English.
Mechanism
- BK-tree over 500K-token corpus
- Top-K candidates ranked by LM+edit
- Frequency cut-off + depth tunable
- Returns full candidate metadata
Consensus Voting
Conservative ensemble fusion
Example
What it does
Stage 3 walks the alignment column-by-column. For each position the source is preserved unless companion models reach a decisive consensus against it, at which point the OOV map is consulted.
Why it matters
Naive majority voting destroys signal when high-WER companions outvote a low-WER source — the classic ROVER failure. CORAL gives the source the benefit of the doubt.
Mechanism
- Position-wise tally over ensemble
- Source-bias: ties favour source
- OOV-aware overrides
- Position-level diff emitted
LLM Refinement
Grammar pass with bounded authority
Example
What it does
Stage 4 sends the voted output plus structured upstream metadata to a chat-tuned LLM acting as an Urdu linguist. It can fix gender, izafat, postpositions, conjugation — but cannot freely rewrite.
Why it matters
Some classes of error are beyond the reach of lexical alignment: ‘کرتا/کرتی/کرتے’ disagreement, missing ‘نے/کو/سے’, dialectal verb forms. The LLM stage closes that gap.
Mechanism
- System prompt encodes priorities
- Receives confidence-weighted hyp.
- JSON: corrected + reason + changes
- Authority bounded by upstream
— End-to-End Flow
Composable, deterministic, traceable.
Input
Ensemble
outputs
Stage 00
Urdu Normalisation
Stage 01
Split-Merge Alignment
Stage 02
OOV + BK-tree Lookup
Stage 03
Consensus Voting
Stage 04
LLM Refinement
Output
Corrected
transcript
Want to see it move?
The interactive demo lets you step through every stage on your own audio or text.
Launch the Demo