Detail of cuneiform inscriptions on stone — wedge-shaped marks of the world's first writing system, the raw material the engine reads

Under the hood

The engine.
A translator that grows.

How the tablets become readable — honestly, layer by layer, and how the system gets smarter every time we use it.

What it is

A pipeline, not a magic box.

The engine is a translation pipeline. Its core is Claude — Anthropic's large language model — wrapped in a system that feeds it the right context for every tablet it sees. It is honest about what it knows, and it gets better the more we use it.

Claude does not learn between sessions. The weights of the model don't change as we work. What changes is the knowledge base around it — the glossary of Sumerian and Akkadian terms we build, the parallel passages we collect, the scholar notes we attach to each tablet, the translations we validate. That base is where the intelligence accumulates.

The technique has a name in the field: retrieval-augmented generation, or RAG. It is mature, well-understood, and exactly the right shape for a project that gathers a corpus over time.

The pipeline

How a translation comes to be.

When we ask the engine to translate a tablet, it runs through six steps:

1
Fetch transliteration + photograph
The Latin-alphabet rendering of the cuneiform signs, drawn from CDLI or ORACC where it exists — and the tablet's photograph from our local archive.
2
Retrieve context
Parallel passages we have already translated (formulaic twins with scholar translations), glossary entries for the words appearing here, the historical period's conventions.
3
Prompt Claude — with the photo
A structured prompt assembles all of the above. The model examines the photograph directly and cross-checks the transliteration against what it can actually see: damaged passages become brackets, not guesses. Tablets translated without a photograph are explicitly labelled as indicative readings.
4
Receive four layers
A literal translation, an interpretation for the general reader, a modern-English rendering, plus a confidence assessment, flagged uncertain terms, and the model's reasoning.
5
Display with confidence indicator
The reader sees the translation alongside a visible confidence badge — verified / high / medium / low / experimental — the source, and a 'read from photo' indicator when the image was examined.
6
Capture scholar feedback
When a specialist confirms or corrects a translation, the result enters a 'verified' tier and feeds future retrievals — making subsequent translations of related tablets more accurate.

The compounding effect

The project itself becomes the brain.

On day one, the engine has no domain knowledge beyond what Claude was trained on. Useful, but generic.

On day a hundred, the engine has hundreds of validated translations, a working glossary of recurring Sumerian terms, cross-references between every period, and the curated notes of every scholar who has corrected something. Every new translation request retrieves from this richer base — and so every new translation is sharper than the last.

On day a thousand, if this project survives, the knowledge base will be a serious resource in its own right — usable by other scholars, by other tools, by the next person who tries to do something like this.

The model does not get smarter. The system does. That distinction matters.

Measured, every night

The engine takes an exam before it publishes.

An engine that grows is only worth trusting if something independent checks that it is actually getting better — and never starts inventing. So every night, after the batch of new translations, the pipeline grades itself twice before anything is deployed:

The confabulation test. The engine is shown trap tablets whose photograph and transliteration deliberately do not match. Passing means refusing to translate what it cannot see. A single failure blocks that night's publication until a human has looked.
The gold benchmark.A fixed panel of 35 tablets with published scholar translations, spread across every period. The engine translates them blind — it never sees the reference — and a second, stronger model grades accuracy and coverage against the scholar's text, penalising wrong numbers and wrong names hardest. The scores accumulate into a public trend: the number that has to rise over the months.

Once a week, the engine goes further: it revises its own instructions. It reads its worst-graded cases of the week, proposes a revised system prompt, and the revision is gated — it must beat the current prompt on the same tablets and keep the confabulation test perfect, or it is rejected. Stagnation is the safe default; most Sundays, nothing changes. When a revision is promoted, every older translation is marked stale and gradually re-translated with the better prompt.

The experimental lab

From photo alone — the harder ambition.

The pipeline above starts from a transliteration that already exists, with the photograph as its witness. Most catalogued tablets have one. But the deeper ambition is to handle tablets that have never been transliterated — to take only a photograph of a tablet and return, eventually, a translation.

That requires three layers, each at a different level of scientific maturity:

Sign recognition — identifying cuneiform signs in an image. Active research; open models like DeepScribe (UPenn) work well on clean tablets, less well on damaged ones.
Transliteration — assembling identified signs into a Latin-alphabet sequence. Largely a solved problem for known sign repertoires.
Translation — turning that sequence into English. Mature for Akkadian (Akkademia at Yale & Tel Aviv, 2023). Much harder for Sumerian — a language isolate with less training data, less commercial interest, and nothing yet comparable to Akkademia.

The first layer is where we already stand in part: the production engine examines every photograph it is given. What is not in production is translation from a photo with no transliteration at all — reading raw signs from the clay. That will arrive as an explicitly experimental feature, with outputs marked accordingly. Building it from scratch would be a doctoral-level project; integrating existing open models and stacking Claude on top is a v2 effort.

Honesty, by design

Uncertainty is a feature, not a defect.

Most popular accounts of antiquity smooth over what we don't know. Academic accounts, by contrast, are precise about it — but often inaccessible. This site tries to do both: show the reader what we know, what we suspect, and what is genuinely unknown.

Every translation on the site carries a visible badge:

Scholar-verified — vetted by a trained Assyriologist or drawn from an established scholarly edition.
High confidence — well-attested text, clear translation, drawn from open scholarly corpora.
Medium confidence — broadly accepted reading with some contested elements.
Low confidence — partial reading, disputed terms.
Experimental — AI-assisted hypothesis, not a substitute for trained scholarly work; treat as a starting point.

The open invitation

Help is welcome.

If you are a scholar, a developer, or simply someone who notices a mistake — please reach out. The engine is built to absorb corrections and improve from them. Every validated correction makes every future translation more accurate.

See the contact section on the about page.

The engine.A translator that grows.