Sumerian·Book
Detail of cuneiform inscriptions on stone — wedge-shaped marks of the world's first writing system, the raw material the engine reads

Under the hood

The engine.
A translator that grows.

How the tablets become readable — honestly, layer by layer, and how the system gets smarter every time we use it.

What it is

A pipeline, not a magic box.

The engine is a translation pipeline. Its core is Claude — Anthropic's large language model — wrapped in a system that feeds it the right context for every tablet it sees. It is honest about what it knows, and it gets better the more we use it.

Claude does not learn between sessions. The weights of the model don't change as we work. What changes is the knowledge base around it — the glossary of Sumerian and Akkadian terms we build, the parallel passages we collect, the scholar notes we attach to each tablet, the translations we validate. That base is where the intelligence accumulates.

The technique has a name in the field: retrieval-augmented generation, or RAG. It is mature, well-understood, and exactly the right shape for a project that gathers a corpus over time.

The pipeline

How a translation comes to be.

When we ask the engine to translate a tablet, it runs through six steps:

  1. 1
    Fetch transliteration

    The Latin-alphabet rendering of the cuneiform signs, drawn from CDLI or ORACC where it exists.

  2. 2
    Retrieve context

    Similar passages we have already translated, glossary entries for the words appearing here, the historical period's conventions, scholar notes on adjacent tablets.

  3. 3
    Prompt Claude

    A structured prompt assembles all of the above and asks for a translation, a confidence assessment, and an explanation of any uncertain terms.

  4. 4
    Receive translation + confidence + reasoning

    Claude returns a translation, flags words it is unsure about, and gives its reasoning where it diverges from the standard rendering.

  5. 5
    Display with confidence indicator

    The user sees the translation alongside a visible confidence badge — verified / high / medium / low / experimental — and the source of the translation.

  6. 6
    Capture scholar feedback

    When a specialist confirms or corrects a translation, the result enters a 'verified' tier and feeds future retrievals — making subsequent translations of related tablets more accurate.

The compounding effect

The project itself becomes the brain.

On day one, the engine has no domain knowledge beyond what Claude was trained on. Useful, but generic.

On day a hundred, the engine has hundreds of validated translations, a working glossary of recurring Sumerian terms, cross-references between every period, and the curated notes of every scholar who has corrected something. Every new translation request retrieves from this richer base — and so every new translation is sharper than the last.

On day a thousand, if this project survives, the knowledge base will be a serious resource in its own right — usable by other scholars, by other tools, by the next person who tries to do something like this.

The model does not get smarter. The system does. That distinction matters.

The experimental lab

From photo to translation — the harder ambition.

The pipeline above starts from a transliteration that already exists. Most catalogued tablets have one. But the deeper ambition is to handle tablets that have never been transliterated — to take a photograph of a tablet and return, eventually, a translation.

That requires three layers, each at a different level of scientific maturity:

  • Sign recognition — identifying cuneiform signs in an image. Active research; open models like DeepScribe (UPenn) work well on clean tablets, less well on damaged ones.
  • Transliteration — assembling identified signs into a Latin-alphabet sequence. Largely a solved problem for known sign repertoires.
  • Translation — turning that sequence into English. Mature for Akkadian (Akkademia at Yale & Tel Aviv, 2023). Much harder for Sumerian — a language isolate with less training data, less commercial interest, and nothing yet comparable to Akkademia.

For now, the photo-to-translation pipeline is not in production. It will arrive as an explicitly experimental feature, with outputs marked accordingly. Building it from scratch would be a doctoral-level project; integrating existing open models and stacking Claude on top is a v2 effort.

Honesty, by design

Uncertainty is a feature, not a defect.

Most popular accounts of antiquity smooth over what we don't know. Academic accounts, by contrast, are precise about it — but often inaccessible. This site tries to do both: show the reader what we know, what we suspect, and what is genuinely unknown.

Every translation on the site carries a visible badge:

  • Scholar-verified — vetted by a trained Assyriologist or drawn from an established scholarly edition.
  • High confidence — well-attested text, clear translation, drawn from open scholarly corpora.
  • Medium confidence — broadly accepted reading with some contested elements.
  • Low confidence — partial reading, disputed terms.
  • Experimental — AI-assisted hypothesis, not a substitute for trained scholarly work; treat as a starting point.

The open invitation

Help is welcome.

If you are a scholar, a developer, or simply someone who notices a mistake — please reach out. The engine is built to absorb corrections and improve from them. Every validated correction makes every future translation more accurate.

See the contact section on the about page.