Phase 1 · recording now Addis Ababa

Documenting Ethiopian Sign Language in 3D & Virtual Reality.

We capture ESL on Meta Quest 3 — every finger, every shift of the shoulder — with the people who actually speak it.

REC · HAND TOPOLOGY 21 JOINTS
CAPTURE · 72 Hz ሰላም · /sɛlam/ · HELLO
word : ሰላም
frame : 0182
x : 0.142
y : 0.518
z : -0.07
conf : 0.98
ሰላም
FIG. 01 / RIGHT HAND · OPEN PALM · "ሰላም" FRAME 0182 / 1080
§ OPEN SOURCE · CONTRIBUTE Natnael1234 / Ejelisan1405

The capture tool is being built in Unity. Help us ship it.

Open source — built in the open so deaf communities, researchers, and Unity developers can shape how a language is documented.

UNITY QUEST 3 C#
§ 01 — Mission

Changing the foundation, not the surface.

ESL has been left out of the digital world — no dataset, no tools, no AI that can read it. We start with documentation, because nothing else can be built without it.

§ 02 — Phases

Three phases. One foundation.

PHASE 01

Documentation.

Record professional trainers on Quest 3. Hands, body, and head saved as 3D motion, labelled with the Amharic and English word.

in progress PHASE 01 ›
PHASE 02

Learning.

The dataset becomes a classroom. A VR room, a 3D instructor, practice sessions, AI-assisted feedback on the learner's own signs.

in design PHASE 02 ›
PHASE 03

Interpretation.

A 3D AI avatar that converts spoken Amharic into sign language live — for broadcasts, classrooms, and public services.

future PHASE 03 ›
§ 03 — Capture

A word, a sign, 72 frames a second.

Built in Unity with the Meta XR SDK and Movement SDK. Runs entirely on the headset.

  1. 01 The trainer puts on a Quest 3.
  2. 02 An Amharic word appears — ሰላም.
  3. 03 They perform the sign.
  4. 04 Hand, body, and head are captured in 3D, 72 times a second.
  5. 05 The motion is saved with its Amharic and English label.
  6. 06 Green ring when steady, red ring counts five. Next word.
§ 04 — Dataset & AI

Many samples. One canonical sign.

A single recording is not a language. Every word is captured many times, by many trainers — variations in hand size, signing speed, and regional style become features, not noise. AI does the rest: aligning, cleaning, clustering, and labelling each frame so the dataset can teach a model what a sign is.

01
Collect

Multi-sample capture.

Each word: multiple trainers · multiple takes · multiple speeds. Coverage across dialects and signing styles.

10–30 SAMPLES / WORD
02
Process · AI

Align, cluster, label.

ML models segment the steady portion, align takes onto a common timeline, cluster variants, and write per-frame labels. Humans review the edge cases.

EMBEDDINGS · CLUSTERS · LABELS
03
Output

A canonical sign per word.

The dataset ships with a clean, reproducible signing of each word — plus its full variant library. Ready for a VR classroom, ready for a model to learn from.

→ TOWARD PHASE 03

The same AI that builds the dataset becomes the interpreter.

Once we can recognise signs in 3D, we can generate them too. A speech-to-sign model takes spoken Amharic, predicts the sequence of canonical signs, and drives a 3D avatar in real time — overlaid on a TV broadcast, in a classroom, or on a phone.

speech → text text → sign sequence sign → 3D avatar
§ 05 — Roadmap

The path, step by step.

  1. PHASE 01 / 01 NOW

    Recording setup & configuration platform.

    Building the Unity capture app on Meta Quest 3: hand and upper-body tracking, word-prompt flow, steady-state detection, labelled clip storage. This is the foundation everything else records onto.

  2. PHASE 01 / 02

    Trainer onboarding & pilot sessions.

    Bring professional ESL trainers into the loop. Refine the capture experience with them, settle the consent and pay model.

  3. PHASE 01 / 03

    Multi-sample word capture at scale.

    Record the everyday vocabulary — many takes per word, across multiple trainers. Daily verbs, family terms, religious and civic vocabulary.

  4. PHASE 01 / 04

    AI dataset pipeline.

    Cleaning, alignment, clustering, and labelling at scale. First public sample release under an open license.

  5. PHASE 02

    VR learning module.

    A learner enters a virtual room with a 3D instructor, practises signs, and gets AI feedback on their own hand pose.

  6. PHASE 03

    Realtime AI interpretation avatar.

    Speech-to-sign in real time. A 3D avatar that converts spoken Amharic into ESL on TV, in classrooms, and in public services.

§ 06 — Get involved

There's a place for you in this work.

FOR · TRAINERS

ESL teacher or interpreter?

Help us capture signs. Sessions are paid and flexible.

Get in touch
FOR · DEVELOPERS

Developer or researcher?

We'll release dataset samples and tools under an open license.

Follow along
FOR · SUPPORTERS

Supporter or funder?

We are seeking partners for Phase 2.

Let's talk