Blog

Boost Documentation Efficiency: How YAML Outperforms XML, Markdown, and Databases

Olivier Carrère
#YAML#OpenAPI#Schema#Structured Data#Technical Writing#Content Reuse#Validation

YAML is a lightweight, human-readable data format that simplifies configuration files and data exchange. Its clarity, flexibility, and efficiency make it increasingly popular among developers, often outperforming XML, Markdown, and some database solutions in modern applications.

You can also leverage YAML for structured documentation.

Imaginary Scenario: Choosing the Right Engine Oil

Imagine you’re an engine oil manufacturer. Every day, customers ask you which oil is best for their engines. Some have one-cylinder engines, others have multi-cylinder setups. Price, viscosity, and compatibility all matter—but helping them make the right choice isn’t just about knowing your products. It’s about how you store and present that information.

At first, it might seem simple. You could create a quick reference table:

Oil TypeBrandUsePriceViscosity Grade
Primary oilA1XOne-cylinder engines150W-20
Secondary oilB2ZTwo-cylinder engines175W-30

Looks neat, right? But as your product line grows, so does the complexity. Adding new oils, updating prices, or including extra metadata like cylinder count or warranty quickly turns into a maintenance nightmare.

Bottles of Pennzoil motor oil on a store shelf.

The XML temptation

Some teams start with DITA reference XML, thinking structure solves the problem:

XML logoDITA reference XML
<reference id="oil-types">
  <title>Oil types</title>
  <shortdesc>You will find below the recommended oil types.</shortdesc>
  <refbody>
    <section>
      <title>Primary oil</title>
      <ul>
        <li>Brand: A1X</li>
        <li>Use: One-cylinder engines</li>
        <li>Price: 15</li>
        <li>Viscosity grade: 0W-20</li>
      </ul>
    </section>
    <section>
      <title>Secondary oil</title>
      <ul>
        <li>Brand: B2Z</li>
        <li>Use: Two-cylinder engines</li>
        <li>Price: 17</li>
        <li>Viscosity grade: 5W-30</li>
      </ul>
    </section>
  </refbody>
</reference>

It seems neat at first. Each oil has a dedicated section. But the moment prices change, new products are added, or you want to track extra attributes, the XML becomes cumbersome.

IssueDescription
Hardcoded ValuesEvery data point is embedded in XML. Updates require manual changes, which is error-prone.
Mixing Data and Presentation<ul> and <li> combine field names and values, making automated processing difficult.
Poor ScalabilityAdding oils or metadata requires repeating XML structures.
Lack of Unique IdentifiersSections are distinguished by titles only, risking breakage in workflows if names change.
Limited ReusabilityCopying sections across documents increases the risk of inconsistencies.
Ambiguous Values<li>Price: 15</li> lacks units or currency.
No Validation for Consistent StructureMissing fields reduce data quality over time.

Hardcoded XML works for tiny lists—but it quickly becomes brittle as content grows.


Markdown tables: simple but limiting

Markdown tables are easier to read:

| Oil Type      | Brand | Use                  | Price | Viscosity Grade |
| ------------- | ----- | -------------------- | ----- | --------------- |
| Primary oil   | A1X   | One-cylinder engines | 15    | 0W-20           |
| Secondary oil | B2Z   | Two-cylinder engines | 17    | 5W-30           |

They render nicely and are human-friendly—but they carry hidden maintainability problems:

IssueDescription
Hardcoded DataManual updates are required for any change, which is error-prone.
Lack of Semantic StructureField names and values are not machine-readable, making automation difficult.
Poor ScalabilityAdding new oils or metadata requires editing the table structure.
No Unique IdentifiersRows are identified only by “Oil Type,” making programmatic referencing unreliable.
AmbiguitiesValues like Price: 15 lack units, which can cause misinterpretation.
Limited ReusabilityTables are hard to reuse across multiple documents, creating redundancy.

Warning sign The limits of Markdown tables Markdown was designed so that the source should be almost as human-readable as the output, whether rendered as HTML, PDF, or another format. But tables are an exception: they introduce several challenges.

Long lines quickly become a nightmare to read and edit, as text editors wrap them and make it hard to distinguish one row from another. The visual benefit of tables for readers—having columns neatly aligned on the same vertical lines—turns into a problem for authors.

| Oil Type | Brand | Use  | Price | Viscosity Grade |
| - | - | - | - | - |
| Primary oil | A1X | One-cylinder engines | 15 | 0W-20 |
| Secondary oil | B2Z | Two-cylinder engines | 17 | 5W-30 |

Markdown source table

Some text editors can automatically realign table columns when you edit a cell, but this triggers a full table refactor. The result? A very noisy Git diff, where Git flags entire lines as changed even though only a few whitespace characters were adjusted. Conversely, if you avoid reformatting and keep column widths fixed, the table becomes hard for humans to parse and maintain.

Markdown is fine for static pages—but as your documentation grows, you need a better approach.


Databases: flexible but costly

Databases offer structured storage and queries, but repeated queries can create performance issues:

IssueDescription
Performance OverheadEach query consumes resources; repeated queries slow the system.
Increased Network TrafficMultiple queries generate unnecessary load.
Higher CostsCloud usage costs rise with frequent queries.
Data Consistency IssuesData changes between queries can produce inconsistent results.
Scalability ProblemsHeavy query repetition strains the database.
Redundant ComputationExpensive operations are repeated unnecessarily.
Increased LatencyUsers notice slower response times for each query.

Mitigation strategies like caching, batching, materialized views, and optimized indexing help—but they add complexity.


YAML: readable, structured, and scalable

This is where YAML shines. It’s human-readable, hierarchical, and structured, making it ideal for documentation that needs to scale.

id: oil-types
title: Oil types
shortdesc: Recommended oil types
properties:
  headers:
    type: Type
    value: Brand
    usage: Use
  rows:
    - type: Primary oil
      value: A1X
      usage: One-cylinder engines
    - type: Secondary oil
      value: B2Z
      usage: Two-cylinder engines

Rendered in Markdown, it produces a clean table:

Oil typeOil brandUse
Primary oilA1XOne-cylinder engines
Secondary oilB2ZTwo-cylinder engines

Benefits of YAML

FeatureDetails
Separation of Data and PresentationData can be reused in tables, lists, APIs, or UIs without touching formatting.
Structured and PredictableConsistent schema reduces human error and simplifies automation.
Easy to ExtendAdd new oils or metadata without redesigning the structure.
Supports AutomationScripts or static site generators can consume YAML directly.
Unique IdentifiersTop-level id allows reliable referencing across documents.
Improved Readability and MaintainabilityKeys are self-explanatory, easier to understand than inline XML or Markdown tables.
Scalable for Large DatasetsWorks equally well for 5 or 500 rows; updates and validation remain straightforward.

Versatility

One of the key advantages of using structured data formats like YAML is versatility. The same dataset can be rendered in multiple ways—lists, tables, or even more complex layouts—without changing the source file.

Example: Display your data as a simple list

Oil types

You will find below the recommended oil types.

  • Primary oil
    • Brand: A1X
    • Usage: One-cylinder engines
    • Viscosity grade: 0W-20
    • Price: $15.00
  • Secondary oil
    • Brand: B2Z
    • Usage: Two-cylinder engines
    • Viscosity grade: 5W-30
    • Price: $17.00

Or you can present it as a styled two-column table on desktop (that automatically collapses into individual labeled rows inside cards on mobile):

Example: Display your data as a styled two-column table

Oil types

You will find below the recommended oil types.

Brand Details
A1X
Primary oil
$15.00
1
0W-20
B2Z
Secondary oil
$17.00
2
5W-30

If you need a richer display, you can even switch to a five-column table on desktop (that automatically stacks into a single-column card view on mobile)—all without editing the source YAML.

flowchart LR %% Nodes B["YAML data"] F1["Title"] F2["Short<br/>description"] F3["Table<br/>header"] H["One table row<br/>per YAML entry"] %% Row values (documents) in one subgraph subgraph RowValues[ ] direction LR J["Primary<br/>oil"] K["A1X"] L["..."] end %% Row headers in one subgraph subgraph RowHeaders[ ] direction LR H1["Type"] H2["Brand"] H3["..."] end %% Flows with animated links B a1@--> F1 B a2@--> F2 B a3@--> F3 B a4@--> H H a5@-.-> J H a6@-.-> K H a7@-.-> L F3 a8@--> H1 F3 a9@--> H2 F3 a10@--> H3 %% Animation speeds a1@{ animation: slow } a2@{ animation: slow } a3@{ animation: slow } a4@{ animation: fast } a5@{ animation: slow } a6@{ animation: slow } a7@{ animation: slow } a8@{ animation: slow } a9@{ animation: slow } a10@{ animation: slow } %% Styling classDef topic fill:#fbe9e4,stroke:#df9277,stroke-width:2px,color:#5a2e23,font-weight:bold,rx:8,ry:8; classDef step fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef substep fill:#3498dc,stroke:#1f5a82,stroke-width:2px,color:#ffffff,font-weight:bold,rx:8,ry:8; classDef documents fill:#ebf0f1,stroke:#4c4c4c,stroke-width:2px,color:#000000,font-weight:bold,rx:8,ry:8; %% Apply classes class B topic; class F1,F2,F3 step; class H1,H2,H3 substep; class J,K,L documents; %% Link styles linkStyle default stroke:#d97d61,stroke-width:2px;

Processing Oil Data: From YAML Import to Dynamic HTML Table

Example: Display your data as a dynamic HTML table
Type ▲▼ Brand ▲▼ Cylinders ▲▼ Viscosity grade ▲▼ Price ▲▼
Primary oil A1X One-cylinder engines 0W-20 $15.00
Secondary oil B2Z Two-cylinder engines 5W-30 $17.00

This flexibility allows your documentation to adapt to different contexts—whether it’s a quick reference list for users, a detailed table for technical readers, or a complex component in a UI. By separating content from presentation, you can maintain a single source of truth while offering multiple ways to consume the data.


Easier diffs and cleaner version control

One hidden drawback of Markdown tables is how poorly they behave in version control. If you remove or reorder a column, Git compares the file line by line—producing a messy, unreadable diff where every row appears changed.

Git logo Git diff
diff --git a/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx b/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
index 61dadd8..0f70057 100644
--- a/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
+++ b/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
@@ -26,10 +26,10 @@ Imagine you’re an engine oil manufacturer. Every day, customers ask you which

 At first, it might seem simple. You could create a quick reference table:

-| Oil Type      | Brand | Use                  | Price | Viscosity Grade |
-| ------------- | ----- | -------------------- | ----- | --------------- |
-| Primary oil   | A1X   | One-cylinder engines | 15    | 0W-20           |
-| Secondary oil | B2Z   | Two-cylinder engines | 17    | 5W-30           |
+| Oil Type      | Use                  | Price | Viscosity Grade |
+|---------------|----------------------|-------|-----------------|
+| Primary oil   | One-cylinder engines | 15    | 0W-20           |
+| Secondary oil | Two-cylinder engines | 17    | 5W-30           |
Git logo Tip: Mitigate the issue with third-party tools

To partially improve the readability of table diffs, you can use tools like GitHub Desktop or git diff —word-diff, or define a custom diff driver in .gitattributes.

These approaches make diffs more palatable to human readers, but under the hood, Git still considers entire lines as changed.

GitHub Desktop screenshot showing a more human-readable diff, even though Git internally still sees full-line changes.

GitHub Desktop and the command line highlight deletions in green, reflecting Git’s line-based diffing: it sees entire lines as modified, even if the change was only a deletion.

By contrast, when tables are generated from a YAML source, all you need to do is remove the corresponding key-value pairs in the YAML file.

Git logo Git diff
diff --git a/src/data/oil-types.yaml b/src/data/oil-types.yaml
index 11eef57..87c04df 100644
--- a/src/data/oil-types.yaml
+++ b/src/data/oil-types.yaml
@@ -5,24 +5,20 @@ shortdesc: You will find below the recommended oil types.
 properties:
   headers:
     type: Type
-    name: Brand
     usage: Use
   row_schema:
     type: str
-    name: str
     price: float
     cylinders: int
     viscosity_grade: str
   rows:
     - type: Primary oil
-      name: A1X
       price: 15.0
       cylinders: 1
       viscosity_grade: 0W-20
     - type: Secondary oil
-      name: B2Z
       price: 17.0
       cylinders: 2
       viscosity_grade: 5W-30

That change is perfectly legible in a Git diff.

An even better approach is to edit the script that generates the table: often just a few lines of code, producing a simple, human-readable change that’s easier to review, merge, and revert.

Git logo Git diff
diff --git a/src/components/table.astro b/src/components/table.astro
index c05af99..d594203 100644
--- a/src/components/table.astro
+++ b/src/components/table.astro
@@ -3,7 +3,6 @@ import data from "../data/oil-types.yaml";
 type OilRow = {
   type: string;
-  name: string;
   usage: string;
   viscosity_grade: string;
   price: number;
@@ -34,7 +33,6 @@ const wordsToNumber: Record<string, number> = Object.fromEntries(
     <thead>
       <tr>
         <th data-key="type" data-type="text">Type <span class="arrow">▲▼</span></th>
-        <th data-key="name" data-type="text">Brand <span class="arrow">▲▼</span></th>
         <th data-key="cylinders" data-type="cylinders">Cylinders <span class="arrow">▲▼</span></th>
         <th data-key="viscosity_grade" data-type="text">Viscosity grade <span class="arrow">▲▼</span></th>
         <th data-key="price" data-type="number">Price <span class="arrow">▲▼</span></th>
@@ -44,7 +42,6 @@ const wordsToNumber: Record<string, number> = Object.fromEntries(
       {rows.map((row) => (
         <tr>
           <td data-label="Type">{row.type}</td>
-          <td data-label="Brand">{row.name}</td>
           <td data-label="Cylinders">{numberToWords(row.cylinders)}-cylinder engines</td>
           <td data-label="Viscosity grade">{row.viscosity_grade}</td>
           <td data-label="Price">${row.price.toFixed(2)}</td>

Growing with your YAML

Our oil-types.yaml started as a simple table but evolved into a single source of truth:

By enforcing strong typing—numbers for prices, integers for cylinder count—data stays consistent. A central schema ensures DRY (Don’t Repeat Yourself) principles and enables automated validation.


Why a structured source is better than embedded tables

Tables are a convenient way to present structured data in a familiar and scannable layout for users. However, generating user-facing tables directly from Markdown embedded in source files is not the most efficient approach.

A stronger alternative is to extract data at build time from a structured source, such as a YAML single source of truth. This approach allows you to render the same information in multiple ways, each tailored to the target medium and the specific needs of your audience—whether that’s a Markdown table in documentation, a JSON payload for an API, or a dynamic HTML component in a UI.

By decoupling data from presentation, you gain maintainability, consistency, and flexibility as your content grows.


Learn more about getting the benefits of DITA XML without its complexity. Modern docs-as-code workflows let technical writers structure information using lightweight, open tools—no XML headaches required.

← Back to Blog