— The Pipeline

Five stages.
One class of error each.

CORAL is not a model. It is a chain of deterministic algorithms with a single LLM stage at the end — each engineered to neutralise a specific failure mode in Urdu ASR.

00 · Urdu Normalisation 01 · Split-Merge Alignment 02 · OOV + BK-tree Lookup 03 · Consensus Voting 04 · LLM Refinement

Stage 00

Urdu Normalisation

Zero-risk Unicode unification

Example

beforeكيا حال هے؟

afterکیا حال ہے

1.9WER pts (ablation)

What it does

Stage 0 collapses Arabic-form code-points onto canonical Urdu, strips combining diacritics, harmonises hamza placement, removes tatweel, and normalises punctuation.

Why it matters

A substantial fraction of apparent ASR errors aren't recognition failures — they are measurement artefacts caused by Arabic↔Urdu Unicode variants on either side of the comparison.

Mechanism

Arabic Yeh ي → Urdu Yeh ی
Arabic Kaf ك → Urdu Kaf ک
Strip ZWJ + Shadda + Tatweel
Normalise whitespace + punct

Stage 01

Split-Merge Alignment

Word boundaries as first-class data

Example

beforeکیاہے کام

afterکیا ہے کام

36.5%of events are SPLIT/MERGE

What it does

Stage 1 performs source-anchored weighted Levenshtein alignment across every companion model. Each chunk is tagged SAME, SPLIT (1→n), MERGE (n→1) or NOISE.

Why it matters

SPLIT and MERGE together are 36.5% of all inter-model events. ROVER-style fusion misreads them as substitutions and corrupts the voting signal downstream.

Mechanism

Source-anchored Levenshtein
Per-chunk: SAME/SPLIT/MERGE/NOISE
Info tags: MATCH/INS/DEL/SUB
Raw + normalised attempts preserved

Stage 02

OOV + BK-tree Lookup

Hybrid out-of-vocabulary correction

Example

beforeبازر

afterبازار

500Klexicon tokens

What it does

Stage 2 detects tokens absent from the curated Urdu lexicon, queries a BK-tree for edit-distance neighbours, and re-ranks candidates by an n-gram language model conditioned on local context.

Why it matters

Pure edit-distance over-corrects (every misspelling looks like a typo); pure n-gram is under-determined. The combination is robust to dialect, named entities, and code-switched English.

Mechanism

BK-tree over 500K-token corpus
Top-K candidates ranked by LM+edit
Frequency cut-off + depth tunable
Returns full candidate metadata

Stage 03

Consensus Voting

Conservative ensemble fusion

Example

beforeآیا ہے

afterآئی ہیں

Deterministicno model in the loop

What it does

Stage 3 walks the alignment column-by-column. For each position the source is preserved unless companion models reach a decisive consensus against it, at which point the OOV map is consulted.

Why it matters

Naive majority voting destroys signal when high-WER companions outvote a low-WER source — the classic ROVER failure. CORAL gives the source the benefit of the doubt.

Mechanism

Position-wise tally over ensemble
Source-bias: ties favour source
OOV-aware overrides
Position-level diff emitted

Stage 04

LLM Refinement

Grammar pass with bounded authority

Example

beforeبچہ سکول گیا ہیں

afterبچے سکول گئے ہیں

2.5WER pts (ablation)

What it does

Stage 4 sends the voted output plus structured upstream metadata to a chat-tuned LLM acting as an Urdu linguist. It can fix gender, izafat, postpositions, conjugation — but cannot freely rewrite.

Why it matters

Some classes of error are beyond the reach of lexical alignment: ‘کرتا/کرتی/کرتے’ disagreement, missing ‘نے/کو/سے’, dialectal verb forms. The LLM stage closes that gap.

Mechanism

System prompt encodes priorities
Receives confidence-weighted hyp.
JSON: corrected + reason + changes
Authority bounded by upstream

— End-to-End Flow

Composable, deterministic, traceable.

Input

Ensemble
outputs

Stage 00

Urdu Normalisation

Stage 01

Split-Merge Alignment

Stage 02

OOV + BK-tree Lookup

Stage 03

Consensus Voting

Stage 04

LLM Refinement

Output

Corrected
transcript

Want to see it move?

The interactive demo lets you step through every stage on your own audio or text.

Launch the Demo

Five stages.One class of error each.

Urdu Normalisation

Split-Merge Alignment

OOV + BK-tree Lookup

Consensus Voting

LLM Refinement

Composable, deterministic, traceable.

Want to see it move?

Five stages.
One class of error each.