Blog

An AI-based workflow to translate legacy content

Olivier Carrère
#AI#Translation#Technical Writing#Markdown#DeepL#GPT-4o

Translating large sets of legacy documentation is always a challenge. Professional human translation ensures quality, but it is time-consuming and costly—especially when dealing with dozens or even hundreds of Markdown files, diagrams, and embedded metadata.

graph TD %% Nodes SRC["Source (French Markdown)"] DEEPL["DeepL Translation"] GPT["GPT-4o Proofreading"] MANUAL["Manual Cleanup / Frontmatter Fix"] REVIEW["Selective Review with Git"] KEEP["Keep Changes"] DISCARD["Discard Changes"] BUILD["Build & Fix Media"] %% Flows SRC a1@--> DEEPL a2@--> GPT a3@--> MANUAL a4@--> REVIEW REVIEW a5@-->|y| KEEP REVIEW -->|n| DISCARD KEEP a6@--> BUILD BUILD a7@--> |"Next File / Iterate"|SRC a1@{ animation: slow } a2@{ animation: slow } a3@{ animation: slow } a4@{ animation: slow } a5@{ animation: slow } a6@{ animation: slow } a7@{ animation: slow } %% Styling classDef source fill:#fbe9e4,stroke:#df9277,stroke-width:2px,color:#5a2e23,font-weight:bold,rx:8,ry:8; classDef deepl fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef gpt fill:#efb9a6,stroke:#c96a52,stroke-width:2px,color:#3d1d15,font-weight:bold,rx:8,ry:8; classDef manual fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef review fill:#df9277,stroke:#b85b3f,stroke-width:2px,color:#fff,font-weight:bold,rx:8,ry:8; classDef keep fill:#98C86B,stroke:#618044,stroke-width:2px,color:#3d1d15,font-weight:bold,rx:8,ry:8; classDef discard fill:#df9277,stroke:#b85b3f,stroke-width:2px,color:#fff,font-weight:bold,rx:8,ry:8; classDef build fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef next fill:#fbe9e4,stroke:#df9277,stroke-width:2px,color:#5a2e23,font-weight:bold,rx:8,ry:8; %% Apply classes class SRC source; class DEEPL deepl; class GPT gpt; class MANUAL manual; class REVIEW review; class KEEP keep; class DISCARD discard; class BUILD build; class NEXT next; %% Link styles linkStyle default stroke:#df9277,stroke-width:2px;

For my Redaction Technique legacy website, I set up an AI-based iterative workflow that automates much of the heavy lifting while still leaving space for human refinement.

Plastic colorful springs illustrating iterative workflows

Like most efficient processes, it relies on iteration: a loop that combines different AI tools to enable steady, incremental publishing. Human intervention remains critical at every stage—guiding the process, correcting errors, and keeping the results on track. The final step is a thorough human correction, which you can sometimes defer for legacy or non-critical content until analytics confirm that the material is worth the extra investment.

Automatic translation with DeepL

Automatic translation with DeepL: DeepL logo

The first step is to generate raw English translations of all French Markdown files. To do this, write a simple Python script that:

This produces usable English content quickly, but the results often contain broken Markdown tables, mismatched frontmatter, or literal translations that sound clumsy.

Manual cleanup

Before moving on, fix structural issues in the translated Markdown:

This ensures the files build into the website without errors.

AI Proofreading with GPT-4o

IA Proofreading with GPT-4o: GPT logo

DeepL produces decent raw translations, but the style often needs polishing. To automate this, create another Python script that sends the translated file to GPT-4o with a strict editing prompt:

You are an expert technical writing editor. The text is about technical writing, DITA, and structured authoring. Fix inconsistencies, unprofessional style, and poor French-to-English translations. Keep Markdown formatting intact. Return only the corrected text, without explanations.

To stay in control, have the script process one file at a time and then stop. This makes it easier to review and cherry-pick changes.

Selective review with Git

Selective review with Git: Git logo

Review edits selectively with:

$ git add -p

Git shows each change hunk by hunk and lets you decide whether to stage it (y = yes, n = no). You can split hunks with s, edit them manually with e, or quit anytime with q.

graph LR A[Changes] --> B{git add -p} B -->|y| C[Keep] B -->|n| D[Discard] %% Define warm shades based on #df9277 classDef input fill:#fbe9e4,stroke:#df9277,stroke-width:2px,color:#5a2e23,font-weight:bold,rx:8,ry:8; classDef decision fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef keep fill:#98C86B,stroke:#618044,stroke-width:2px,color:#3d1d15,font-weight:bold,rx:8,ry:8; classDef discard fill:#df9277,stroke:#b85b3f,stroke-width:2px,color:#fff,font-weight:bold,rx:8,ry:8; %% Apply styles class A input; class B decision; class C keep; class D discard; %% Style links in warm tone linkStyle default stroke:#df9277,stroke-width:2px;

This workflow is ideal when a file contains unrelated edits—such as AI-generated suggestions—because it gives you full control over what goes into your commit history.

diff —git a/communication-technique.md b/communication-technique.md index d5b0c9b8..7a5632af 100644 +++ b/communication-technique.md --- a/communication-technique.md @@ -1,31 +1,30 @@ +The goal of technical communication is to convert prospects into -The goal of technical communication is to turn prospects into satisfied customers. The technical writer provides the market with (1/1) Stage this hunk [y,n,q,a,d,s,e,p,?]?

Stage the changes you want, commit them with:

$ git commit -m "commit message"

Then discard the rest with:

$ git reset --hard

Build the site and fix media

When the text is ready, build the site locally to catch any remaining errors. Common fixes include:

Rinse and repeat

The process is iterative:

  1. Run the proofreading script on the next Markdown file.
  2. Select changes via git add -p.
  3. Rebuild and adjust assets.

Repeat until the full documentation set is translated.

This workflow isn’t a replacement for professional translation, but it’s “good enough” for legacy content where perfect nuance is less critical. For high-visibility or customer-facing material, plan on a final round of human proofreading.

With this pipeline, you can translate and modernize a large corpus of technical documentation efficiently—combining the strengths of DeepL, GPT, and a bit of manual oversight.

← Back to Blog