What role does Gen-AI play in translation? Why would it even help improve the accuracy of translations? This article addresses both of these questions by unraveling the myth of traditional machine translations and the new buzzy AI translation, their differences, and how to best utilize them by understanding how they work differently. But first, let’s take a look at Raiverb’s appraoch to translating efficiently with AI and how to do that.
Raiverb is a CAT developed from the ground up as a simple-to-use toolbox for translation and localization specialists. The core tool is the Translation Center, this tool was designed around the simple idea to use Gen-AI in order to bring context to translations.
Why Context Matters in AI Translation?
Lack of Context: What Advantage Does Context Bring?
First, let’s try to understand one thing: why is context that important?
Lack of context in AI translation means missing key elements that define the true meaning behind words, such as tone, setting, relationships, and emotions. For example, translating the phrase “I’m sorry” could have vastly different meanings in English depending on the context—it could be an apology, a form of sympathy, or a polite gesture. Without knowing these nuances, the translation could be completely off.
The context tells us which tone, choice of words, or cultural considerations are needed, so the translation is not just linguistically correct but also culturally appropriate. This idea of bringing context is fundamental, as this is how, in our view, using AI in translation makes the most sense. This happens to be what humans have also used for decades, and the very reason why they use CAT software in the first place for display of context while translating.
By bringing in context, the goal is to improve the accuracy and consistency in translations in order to avoid common errors from classic Translation Machines or unsupervised generative AI.
This will be developed in a second part, but first, here is a simple overview of what happens in Raiverb, a CAT designed to bring in as much context as possible for AI to be truly effective.
Maximizing Contextual Data for AI Translation in Raiverb
Now that you understand WHY context matters most in translation, it is time to tell you the HOW.
- When you run a typical job on Raiverb, what happens is:
- 1. You have imported a document. The source text is split into relevant segments (called Translation Units), most often paragraphs, but not always.
- 2. Depending on the file the user imports, and with the user help, all possible metadata is extracted in order to provide the maximum context to the AI: comments, IDs, translations in other languages, time tags, etc.
- 3. Each segment is then analyzed and classified upon its relevance and similarity to other segments. The software knows what segments are repeated, similar, or not related at all.
- 3. Segments and all relevant data are submitted to the AI, translated, and saved separately. Past segments can be referenced at any point if they are relevant to the segment being translated.
- 4. A glossary is created automatically depending on user instructions and/or relevance to the project.
- 5. All translations are saved and kept for later reuse.
Accordingly, the steps here are pretty straightforward:
- Import your content
Just like in the Translation Center, you can click “Import File” and find the source file you want to translate.
Alternatively, drag-and-drop will work as well. - Setup the content
The Importer will appear. Tell Raiverb which column is the source, and which column is your context (IDs, comments, alternative targets, character limits, etc.). - Import your knowledge base
- This will typically includes Glossary and Translation Memory. You can click “Import” and find the corresponding files, or a simple drag-and-drop.
- Setup the extra comment options
Type what guidelines you want AI to follow (including industry, tone, style, glossary extraction instruction, etc.) for the global source, or individually for a specific Translation Unit. - Click on “Start”
Here we go, that’s it!
Still want to check out the detailed steps? Please refer to the User Manual here.
Machine Translation vs AI Translation: Key Differences
By maximizing the contextual data and keeping track of past translations, Raiverb has a consistent output, which makes editing easier.
What are the side benefits?
As a byproduct of its operating method, there’re two side benefits derieved from Raiverb’s mechanism of bringing in context.
- 1. Raiverb lifts the common restrictions in terms of size of the projects that can be run using it.
When there is a cap on file size (which most commonly seen as 5MB-10MB as in most popular automatic translators such as Google or DeepL), it makes it hard to work with large projects if you still want to use AI. This is because a large project, when loaded on browser, can be very slow too. Raiverb is there to undo that limitation.
Being designed as a standalone, downloadable application, and not as a browser SaaS, is a deliberate choice that Raiverb makes. It aims to be different from SaaS which tended to limit the scope of projects that can be worked on. One common complaint by linguists about browser CAT services is that they are slow and frustrating to work with, due to the limitations of networks, and the inevitable UI limitations of input-intensive operations done through browsers.
- 2. Raiverb is not tied to one particular Gen-AI service provider.
The reason why Raiverb provides users with multiple Gen-AI server choice, is to offer them the flexibility to choose their preferred provider, allowing them to select the one that best fits their project needs. By not being locked into one AI provider, users can experiment with different models and choose the one that delivers the best results for their specific language pair, project type, or tone.
Now that you know HOW Raiverb leverages AI for translation, you must have a question: but why not bringing in context for traditional machine translators also? Why AI only? To answer this, we must understand the major difference between the two technologies.
Underlying Concepts Behind Machine Translation
Translation has been one of the earliest application of natural language models.
The underlying technological concepts behind Machine Translation, and generative conversational AI like ChatGPT are the same, but both are created for different objectives.
Traditional Machine Translation (Google Translate, DeepL) has been around for years, and has seen impressive jumps in quality since their inception. But despite their incredible progress, they have always been hindered by two major obstacles:
- 1. The inability to fully understand the context of a text;
- 2. The limitations in creative writing potential.
DeepL can deliver translations with an impressive quality. It can even use custom glossaries, or process whole files. But for large projects, projects sustained over time, or projects with a lot of comments or essential metadata holding the full context of isolated strings of text, it still shows limitations.
This is because Machine Translations deliver a translation of a given text done according to the most likely entry on their trained base of very high quality reference files. These databases include lots of nuances and subtleties in expressions and formulations. However, sadly, if you ask it to translate “How are you?”, it won’t be able to know whether to a formal or informal “you” in French or Chinese: it doesn’t know who is talking, and who is listening.
This may come, but is not quite there yet.
On the other hand, conversational AI such as ChatGPT can do that. AI has the ability to grasp the context of a text. This is why sometimes (but not always, and we come to that later) it simply appears to better for translations that come with detailed instructions.
Computer Assisted Translation, or CAT Tool
There’s nothing wrong with machine translation itself as a productivity tool to help translators boost their speed. However, it often falls short when it comes to understanding context—a critical element in producing accurate and natural translations. This is where Computer-Assisted Translation, or CAT tools come into play.
Using CAT, translators can see the content they need to translate, along with all the metadata they need in order to know more about the specific context of what they are translating.
Say you’re translating a dialogue between two characters. If you’re working on a game project, these dialogues may not be placed in a continuity. They may be one dialogue choice from the player among many. If the translator doesn’t know about the gender, the relationship with the other character(s) being adressed, the location that dialogue takes place in, or any relevant information, it is very easy to get a translation that is not acurate, or looks strange.
There are whole memes on the Internet built around cold, direct translations done without context, because it can be hilarious to see characters in a movie adressing each other like they’re complete strangers after 3 hours of adventures together.
Not to mention some languages use different words, tones, or grammar adjustments depending on these situational facts. And this does not only concern dialogues, but also UI in a software, subtitles in a video, even the translation of PPT files can be highly situational, and may look very wrong if done without context.
On the other hand, conversational AI such as ChatGPT has the ability to understand context. But they also show their limits.
Fundamental Limitations of Automated Translation
This is a statement concerning Translation, but also AI in general.
Many people worry about the progress of AI, mostly for several reasons:
First, it is expected and now well documented that AI, or at least Large Language Models as we know them today, will meet important limitations due to the diminishing returns on training volumes: the more volume you train them on, the less it seems to improve the overall quality of the models.
But more fundamentally, and I dare say philosophically: no matter how complex the neural networks can get, their architecture is solely based around completing a sequence in the most logical order, based on a trained database. This is the congregated experience of human knowledge, but the full experience of what it is to be human is more than the sum of the data created by humans.
Humans can corner themselves into absurd situations, which are often the result of their own contradicting emotions, which they, sometimes and hopefully use to turn into jokes. Machine neural networks do not create conflicts, they do not have contradicting emotions. They simply weight and quantify, then complete. But they will never create jokes about themselves. There is nothing to joke about in what they do.
When they do not function as expected, they hallucinate, or crash. They do not self-correct.
This is the exact reason why they will never be able to do any type of highly creative translation. Some translations are inherently creative, or at least require to undersand what an original author tried to mean. This can be very different from a culture to another. While AI may have references to cultural subtleties if their training contains any, they will never create their own subtlety.
Paring AI with CAT Tool in Translation
Consistency Maintanence
AI translation systems often struggle to maintain consistency in style and tone, which is crucial for preserving a brand’s voice or an author’s unique style. For example, a brand’s messaging might require a formal tone in one context and a conversational tone in another. Without supervision, AI may fail to adapt appropriately, leading to inconsistent translations.
This is where CAT tools, with their translation memories (TMs) and glossaries, play a vital role. By providing AI with predefined terminology and style guidelines, CAT tools help ensure consistency across translations.
Polysemy Disambiguation
AI systems frequently encounter challenges with polysemy—words that have multiple meanings. Without sufficient context, AI may choose the wrong interpretation, resulting in inaccurate translations. For instance, the English word “bank” could refer to a financial institution or the side of a river. Human oversight, combined with CAT tools, is essential to disambiguate such terms and ensure accuracy.
Handling New Terms and Jargon
AI systems often struggle with new words, technical terms, or industry-specific jargon, especially if these terms are absent from their training data. This can lead to awkward or incorrect translations. CAT tools, with their ability to integrate custom glossaries and TMs, provide a solution by equipping AI with the necessary terminology to handle specialized content.
Given these limitations, it is clear that AI cannot operate effectively in isolation. It requires supervision and contextual support to produce high-quality translations. This is where the combination of generative AI and CAT tools becomes invaluable. CAT tools provide the structured frameworks—such as translation memories, glossaries, and style guides—that AI needs to function effectively. Meanwhile, human translators bring the creativity, cultural understanding, and critical thinking necessary to verify and refine AI-generated outputs.
Conclusion
By now, it should be clear why context is everything in AI translation, ensuring better accuracy and relevance in every project—something that’s often lacking in machine-generated translations. The solution? By supervising AI. This is exactly why Raiverb focuses on providing as much context as possible for the AI—ensuring it doesn’t hallucinate, misinterpret, or generate translations that stray from our intent. In fact, one of the most valuable skills for today’s translators is understanding how these technologies work, so we can leverage them to their fullest potential and enhance our own craft.