CORAL
CORAL

— The Pipeline

Five stages.
One class of error each.

CORAL is not a model. It is a chain of deterministic algorithms with a single LLM stage at the end — each engineered to neutralise a specific failure mode in Urdu ASR.

Stage 00

Urdu Normalisation

Zero-risk Unicode unification

Example

beforeكيا حال هے؟
afterکیا حال ہے
1.9WER pts (ablation)

What it does

Stage 0 collapses Arabic-form code-points onto canonical Urdu, strips combining diacritics, harmonises hamza placement, removes tatweel, and normalises punctuation.

Why it matters

A substantial fraction of apparent ASR errors aren't recognition failures — they are measurement artefacts caused by Arabic↔Urdu Unicode variants on either side of the comparison.

Mechanism

  • Arabic Yeh ي → Urdu Yeh ی
  • Arabic Kaf ك → Urdu Kaf ک
  • Strip ZWJ + Shadda + Tatweel
  • Normalise whitespace + punct
Stage 01

Split-Merge Alignment

Word boundaries as first-class data

Example

beforeکیاہے کام
afterکیا ہے کام
36.5%of events are SPLIT/MERGE

What it does

Stage 1 performs source-anchored weighted Levenshtein alignment across every companion model. Each chunk is tagged SAME, SPLIT (1→n), MERGE (n→1) or NOISE.

Why it matters

SPLIT and MERGE together are 36.5% of all inter-model events. ROVER-style fusion misreads them as substitutions and corrupts the voting signal downstream.

Mechanism

  • Source-anchored Levenshtein
  • Per-chunk: SAME/SPLIT/MERGE/NOISE
  • Info tags: MATCH/INS/DEL/SUB
  • Raw + normalised attempts preserved
Stage 02

OOV + BK-tree Lookup

Hybrid out-of-vocabulary correction

Example

beforeبازر
afterبازار
500Klexicon tokens

What it does

Stage 2 detects tokens absent from the curated Urdu lexicon, queries a BK-tree for edit-distance neighbours, and re-ranks candidates by an n-gram language model conditioned on local context.

Why it matters

Pure edit-distance over-corrects (every misspelling looks like a typo); pure n-gram is under-determined. The combination is robust to dialect, named entities, and code-switched English.

Mechanism

  • BK-tree over 500K-token corpus
  • Top-K candidates ranked by LM+edit
  • Frequency cut-off + depth tunable
  • Returns full candidate metadata
Stage 03

Consensus Voting

Conservative ensemble fusion

Example

beforeآیا ہے
afterآئی ہیں
Deterministicno model in the loop

What it does

Stage 3 walks the alignment column-by-column. For each position the source is preserved unless companion models reach a decisive consensus against it, at which point the OOV map is consulted.

Why it matters

Naive majority voting destroys signal when high-WER companions outvote a low-WER source — the classic ROVER failure. CORAL gives the source the benefit of the doubt.

Mechanism

  • Position-wise tally over ensemble
  • Source-bias: ties favour source
  • OOV-aware overrides
  • Position-level diff emitted
Stage 04

LLM Refinement

Grammar pass with bounded authority

Example

beforeبچہ سکول گیا ہیں
afterبچے سکول گئے ہیں
2.5WER pts (ablation)

What it does

Stage 4 sends the voted output plus structured upstream metadata to a chat-tuned LLM acting as an Urdu linguist. It can fix gender, izafat, postpositions, conjugation — but cannot freely rewrite.

Why it matters

Some classes of error are beyond the reach of lexical alignment: ‘کرتا/کرتی/کرتے’ disagreement, missing ‘نے/کو/سے’, dialectal verb forms. The LLM stage closes that gap.

Mechanism

  • System prompt encodes priorities
  • Receives confidence-weighted hyp.
  • JSON: corrected + reason + changes
  • Authority bounded by upstream

— End-to-End Flow

Composable, deterministic, traceable.

Input

Ensemble
outputs

Stage 00

Urdu Normalisation

Stage 01

Split-Merge Alignment

Stage 02

OOV + BK-tree Lookup

Stage 03

Consensus Voting

Stage 04

LLM Refinement

Output

Corrected
transcript

Want to see it move?

The interactive demo lets you step through every stage on your own audio or text.

Launch the Demo