Boost Documentation Efficiency: How YAML Outperforms XML, Markdown, and Databases

YAML is a lightweight, human-readable data format that simplifies configuration files and data exchange. Its clarity, flexibility, and efficiency make it increasingly popular among developers, often outperforming XML, Markdown, and some database solutions in modern applications.

You can also leverage YAML for structured documentation.

Imaginary Scenario: Choosing the Right Engine Oil

Imagine you’re an engine oil manufacturer. Every day, customers ask you which oil is best for their engines. Some have one-cylinder engines, others have multi-cylinder setups. Price, viscosity, and compatibility all matter—but helping them make the right choice isn’t just about knowing your products. It’s about how you store and present that information.

At first, it might seem simple. You could create a quick reference table:

Oil Type	Brand	Use	Price	Viscosity Grade
Primary oil	A1X	One-cylinder engines	15	0W-20
Secondary oil	B2Z	Two-cylinder engines	17	5W-30

Looks neat, right? But as your product line grows, so does the complexity. Adding new oils, updating prices, or including extra metadata like cylinder count or warranty quickly turns into a maintenance nightmare.

Bottles of Pennzoil motor oil on a store shelf.

The XML temptation

Some teams start with DITA reference XML, thinking structure solves the problem:

DITA reference XML

<reference id="oil-types">
  <title>Oil types</title>
  <shortdesc>You will find below the recommended oil types.</shortdesc>
  <refbody>
    <section>
      <title>Primary oil</title>
      <ul>
        <li>Brand: A1X</li>
        <li>Use: One-cylinder engines</li>
        <li>Price: 15</li>
        <li>Viscosity grade: 0W-20</li>
      </ul>
    </section>
    <section>
      <title>Secondary oil</title>
      <ul>
        <li>Brand: B2Z</li>
        <li>Use: Two-cylinder engines</li>
        <li>Price: 17</li>
        <li>Viscosity grade: 5W-30</li>
      </ul>
    </section>
  </refbody>
</reference>

It seems neat at first. Each oil has a dedicated section. But the moment prices change, new products are added, or you want to track extra attributes, the XML becomes cumbersome.

Issue	Description
Hardcoded Values	Every data point is embedded in XML. Updates require manual changes, which is error-prone.
Mixing Data and Presentation	`<ul>` and `<li>` combine field names and values, making automated processing difficult.
Poor Scalability	Adding oils or metadata requires repeating XML structures.
Lack of Unique Identifiers	Sections are distinguished by titles only, risking breakage in workflows if names change.
Limited Reusability	Copying sections across documents increases the risk of inconsistencies.
Ambiguous Values	`<li>Price: 15</li>` lacks units or currency.
No Validation for Consistent Structure	Missing fields reduce data quality over time.

Hardcoded XML works for tiny lists—but it quickly becomes brittle as content grows.

Markdown tables: simple but limiting

Markdown tables are easier to read:

| Oil Type      | Brand | Use                  | Price | Viscosity Grade |
| ------------- | ----- | -------------------- | ----- | --------------- |
| Primary oil   | A1X   | One-cylinder engines | 15    | 0W-20           |
| Secondary oil | B2Z   | Two-cylinder engines | 17    | 5W-30           |

They render nicely and are human-friendly—but they carry hidden maintainability problems:

Issue	Description
Hardcoded Data	Manual updates are required for any change, which is error-prone.
Lack of Semantic Structure	Field names and values are not machine-readable, making automation difficult.
Poor Scalability	Adding new oils or metadata requires editing the table structure.
No Unique Identifiers	Rows are identified only by “Oil Type,” making programmatic referencing unreliable.
Ambiguities	Values like `Price: 15` lack units, which can cause misinterpretation.
Limited Reusability	Tables are hard to reuse across multiple documents, creating redundancy.

The limits of Markdown tables
Markdown was designed so that the source should be almost as human-readable as the output, whether rendered as HTML, PDF, or another format. But tables are an exception: they introduce several challenges.
Long lines quickly become a nightmare to read and edit, as text editors wrap them and make it hard to distinguish one row from another. The visual benefit of tables for readers—having columns neatly aligned on the same vertical lines—turns into a problem for authors.
| Oil Type | Brand | Use  | Price | Viscosity Grade |
| - | - | - | - | - |
| Primary oil | A1X | One-cylinder engines | 15 | 0W-20 |
| Secondary oil | B2Z | Two-cylinder engines | 17 | 5W-30 |
Markdown source table
Some text editors can automatically realign table columns when you edit a cell, but this triggers a full table refactor. The result? A very noisy Git diff, where Git flags entire lines as changed even though only a few whitespace characters were adjusted. Conversely, if you avoid reformatting and keep column widths fixed, the table becomes hard for humans to parse and maintain.

Markdown is fine for static pages—but as your documentation grows, you need a better approach.

Databases: flexible but costly

Databases offer structured storage and queries, but repeated queries can create performance issues:

Issue	Description
Performance Overhead	Each query consumes resources; repeated queries slow the system.
Increased Network Traffic	Multiple queries generate unnecessary load.
Higher Costs	Cloud usage costs rise with frequent queries.
Data Consistency Issues	Data changes between queries can produce inconsistent results.
Scalability Problems	Heavy query repetition strains the database.
Redundant Computation	Expensive operations are repeated unnecessarily.
Increased Latency	Users notice slower response times for each query.

Mitigation strategies like caching, batching, materialized views, and optimized indexing help—but they add complexity.

YAML: readable, structured, and scalable

This is where YAML shines. It’s human-readable, hierarchical, and structured, making it ideal for documentation that needs to scale.

id: oil-types
title: Oil types
shortdesc: Recommended oil types
properties:
  headers:
    type: Type
    value: Brand
    usage: Use
  rows:
    - type: Primary oil
      value: A1X
      usage: One-cylinder engines
    - type: Secondary oil
      value: B2Z
      usage: Two-cylinder engines

Rendered in Markdown, it produces a clean table:

Oil type	Oil brand	Use
Primary oil	A1X	One-cylinder engines
Secondary oil	B2Z	Two-cylinder engines

Benefits of YAML

Feature	Details
Separation of Data and Presentation	Data can be reused in tables, lists, APIs, or UIs without touching formatting.
Structured and Predictable	Consistent schema reduces human error and simplifies automation.
Easy to Extend	Add new oils or metadata without redesigning the structure.
Supports Automation	Scripts or static site generators can consume YAML directly.
Unique Identifiers	Top-level `id` allows reliable referencing across documents.
Improved Readability and Maintainability	Keys are self-explanatory, easier to understand than inline XML or Markdown tables.
Scalable for Large Datasets	Works equally well for 5 or 500 rows; updates and validation remain straightforward.

Versatility

One of the key advantages of using structured data formats like YAML is versatility. The same dataset can be rendered in multiple ways—lists, tables, or even more complex layouts—without changing the source file.

Example: Display your data as a simple list
Oil types

You will find below the recommended oil types.

Primary oil

Brand: A1X

Usage: One-cylinder engines

Viscosity grade: 0W-20

Price: $15.00

Secondary oil

Brand: B2Z

Usage: Two-cylinder engines

Viscosity grade: 5W-30

Price: $17.00

Or you can present it as a styled two-column table on desktop (that automatically collapses into individual labeled rows inside cards on mobile):

Example: Display your data as a styled two-column table
Oil types

You will find below the recommended oil types.

Brand Details

A1X
Primary oil
$15.00
1
0W-20

B2Z
Secondary oil
$17.00
2
5W-30

Brand	Details
A1X	Primary oil $15.00 1 0W-20
B2Z	Secondary oil $17.00 2 5W-30

If you need a richer display, you can even switch to a five-column table on desktop (that automatically stacks into a single-column card view on mobile)—all without editing the source YAML.

flowchart LR %% Nodes B["YAML data"] F1["Title"] F2["Short<br/>description"] F3["Table<br/>header"] H["One table row<br/>per YAML entry"] %% Row values (documents) in one subgraph subgraph RowValues[ ] direction LR J["Primary<br/>oil"] K["A1X"] L["..."] end %% Row headers in one subgraph subgraph RowHeaders[ ] direction LR H1["Type"] H2["Brand"] H3["..."] end %% Flows with animated links B a1@--> F1 B a2@--> F2 B a3@--> F3 B a4@--> H H a5@-.-> J H a6@-.-> K H a7@-.-> L F3 a8@--> H1 F3 a9@--> H2 F3 a10@--> H3 %% Animation speeds a1@{ animation: slow } a2@{ animation: slow } a3@{ animation: slow } a4@{ animation: fast } a5@{ animation: slow } a6@{ animation: slow } a7@{ animation: slow } a8@{ animation: slow } a9@{ animation: slow } a10@{ animation: slow } %% Styling classDef topic fill:#fbe9e4,stroke:#df9277,stroke-width:2px,color:#5a2e23,font-weight:bold,rx:8,ry:8; classDef step fill:#f6d1c5,stroke:#d97d61,stroke-width:2px,color:#4a241a,font-weight:bold,rx:8,ry:8; classDef substep fill:#3498dc,stroke:#1f5a82,stroke-width:2px,color:#ffffff,font-weight:bold,rx:8,ry:8; classDef documents fill:#ebf0f1,stroke:#4c4c4c,stroke-width:2px,color:#000000,font-weight:bold,rx:8,ry:8; %% Apply classes class B topic; class F1,F2,F3 step; class H1,H2,H3 substep; class J,K,L documents; %% Link styles linkStyle default stroke:#d97d61,stroke-width:2px;

Processing Oil Data: From YAML Import to Dynamic HTML Table

Example: Display your data as a dynamic HTML table

Type ▲▼ Brand ▲▼ Cylinders ▲▼ Viscosity grade ▲▼ Price ▲▼

Primary oil A1X One-cylinder engines 0W-20 $15.00
Secondary oil B2Z Two-cylinder engines 5W-30 $17.00

Type ▲▼	Brand ▲▼	Cylinders ▲▼	Viscosity grade ▲▼	Price ▲▼
Primary oil	A1X	One-cylinder engines	0W-20	$15.00
Secondary oil	B2Z	Two-cylinder engines	5W-30	$17.00

This flexibility allows your documentation to adapt to different contexts—whether it’s a quick reference list for users, a detailed table for technical readers, or a complex component in a UI. By separating content from presentation, you can maintain a single source of truth while offering multiple ways to consume the data.

Easier diffs and cleaner version control

One hidden drawback of Markdown tables is how poorly they behave in version control. If you remove or reorder a column, Git compares the file line by line—producing a messy, unreadable diff where every row appears changed.

Git diff

diff --git a/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx b/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
index 61dadd8..0f70057 100644
--- a/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
+++ b/src/content/blog/scalable-maintainable-technical-docs-with-yaml.mdx
@@ -26,10 +26,10 @@ Imagine you’re an engine oil manufacturer. Every day, customers ask you which

 At first, it might seem simple. You could create a quick reference table:

-| Oil Type      | Brand | Use                  | Price | Viscosity Grade |
-| ------------- | ----- | -------------------- | ----- | --------------- |
-| Primary oil   | A1X   | One-cylinder engines | 15    | 0W-20           |
-| Secondary oil | B2Z   | Two-cylinder engines | 17    | 5W-30           |
+| Oil Type      | Use                  | Price | Viscosity Grade |
+|---------------|----------------------|-------|-----------------|
+| Primary oil   | One-cylinder engines | 15    | 0W-20           |
+| Secondary oil | Two-cylinder engines | 17    | 5W-30           |

Tip: Mitigate the issue with third-party tools
To partially improve the readability of table diffs, you can use tools like GitHub Desktop or git diff —word-diff, or define a custom diff driver in .gitattributes.
These approaches make diffs more palatable to human readers, but under the hood, Git still considers entire lines as changed.
GitHub Desktop and the command line highlight deletions in green, reflecting Git’s line-based diffing: it sees entire lines as modified, even if the change was only a deletion.

By contrast, when tables are generated from a YAML source, all you need to do is remove the corresponding key-value pairs in the YAML file.

Git diff

diff --git a/src/data/oil-types.yaml b/src/data/oil-types.yaml
index 11eef57..87c04df 100644
--- a/src/data/oil-types.yaml
+++ b/src/data/oil-types.yaml
@@ -5,24 +5,20 @@ shortdesc: You will find below the recommended oil types.
 properties:
   headers:
     type: Type
-    name: Brand
     usage: Use
   row_schema:
     type: str
-    name: str
     price: float
     cylinders: int
     viscosity_grade: str
   rows:
     - type: Primary oil
-      name: A1X
       price: 15.0
       cylinders: 1
       viscosity_grade: 0W-20
     - type: Secondary oil
-      name: B2Z
       price: 17.0
       cylinders: 2
       viscosity_grade: 5W-30

That change is perfectly legible in a Git diff.

An even better approach is to edit the script that generates the table: often just a few lines of code, producing a simple, human-readable change that’s easier to review, merge, and revert.

Git diff

diff --git a/src/components/table.astro b/src/components/table.astro
index c05af99..d594203 100644
--- a/src/components/table.astro
+++ b/src/components/table.astro
@@ -3,7 +3,6 @@ import data from "../data/oil-types.yaml";
 type OilRow = {
   type: string;
-  name: string;
   usage: string;
   viscosity_grade: string;
   price: number;
@@ -34,7 +33,6 @@ const wordsToNumber: Record<string, number> = Object.fromEntries(
     <thead>
       <tr>
         <th data-key="type" data-type="text">Type <span class="arrow">▲▼</span></th>
-        <th data-key="name" data-type="text">Brand <span class="arrow">▲▼</span></th>
         <th data-key="cylinders" data-type="cylinders">Cylinders <span class="arrow">▲▼</span></th>
         <th data-key="viscosity_grade" data-type="text">Viscosity grade <span class="arrow">▲▼</span></th>
         <th data-key="price" data-type="number">Price <span class="arrow">▲▼</span></th>
@@ -44,7 +42,6 @@ const wordsToNumber: Record<string, number> = Object.fromEntries(
       {rows.map((row) => (
         <tr>
           <td data-label="Type">{row.type}</td>
-          <td data-label="Brand">{row.name}</td>
           <td data-label="Cylinders">{numberToWords(row.cylinders)}-cylinder engines</td>
           <td data-label="Viscosity grade">{row.viscosity_grade}</td>
           <td data-label="Price">${row.price.toFixed(2)}</td>

Growing with your YAML

Our oil-types.yaml started as a simple table but evolved into a single source of truth:

Reusable DITA reference topics
OpenAPI JSON endpoints
UI components for apps

By enforcing strong typing—numbers for prices, integers for cylinder count—data stays consistent. A central schema ensures DRY (Don’t Repeat Yourself) principles and enables automated validation.

Why a structured source is better than embedded tables

Tables are a convenient way to present structured data in a familiar and scannable layout for users. However, generating user-facing tables directly from Markdown embedded in source files is not the most efficient approach.

A stronger alternative is to extract data at build time from a structured source, such as a YAML single source of truth. This approach allows you to render the same information in multiple ways, each tailored to the target medium and the specific needs of your audience—whether that’s a Markdown table in documentation, a JSON payload for an API, or a dynamic HTML component in a UI.

By decoupling data from presentation, you gain maintainability, consistency, and flexibility as your content grows.

Learn more about getting the benefits of DITA XML without its complexity. Modern docs-as-code workflows let technical writers structure information using lightweight, open tools—no XML headaches required.