Skip to content

Source Format

The content of a technical writing project is created in a source format, different from the format of deliverables, the target format. To use a commonly used software analogy, the source format is the recipe, and the target format is the finished dish. In photography, the source format is the RAW file produced by the camera, which professional photographers prefer to edit, while the target format is JPEG.

Word processors have untrained us to distinguish content from form. Confusing the two leads to many errors and wasted time.

A document presented to the user has two fundamental aspects:

  • content,
  • layout.

During the development of technical documentation, these two aspects must be clearly separated. They may be handled by two different professionals:

  • the technical writer,
  • the graphic designer.

When layout is as important as content, or when it must be varied, such as in a marketing brochure, writing and layout are handled with different tools:

  • text editor,
  • page layout software, such as InDesign or Scribus.

When layout is less important than content, or when it must be uniform, as in technical documentation, writing and layout occur in:

File TypeExample
Same filesFor example, FrameMaker files.
Different filesFor example, XML content files with an XSLT stylesheet.

In a FrameMaker file, the separation of content and form is high but not total: content and layout are in the same file. FrameMaker applies a uniform page template to the entire file but allows manual addition of layout elements. The same template can be duplicated across the document, or different templates can be used for each file composing the document.

Source formats: degree of modularity and format Source formats: degree of modularity and format

Source formats can be classified by their degree of modularity and file type.

Structured XML formats DocBook and DITA XML apply a uniform page template to the entire document and do not allow manual layout changes or applying different templates to different files within the document.

FormatManual Layout Possible?
MS WordYes
FrameMakerYes
DITA XMLNo

When content and layout are closely linked, as in a word processor, modifying content without affecting layout is difficult. As a result, each new version of technical documentation requires long hours correcting layout errors generated by the software. This issue is less severe with FrameMaker and virtually nonexistent with DITA XML or DocBook (the only errors possible are compilation errors due to invalid XML syntax, which are easy to fix).

Source files for technical documentation are either:

  • binary, or
  • text.

They can also be:

  • WYSIWYG, or
  • structured.

Finally, they can be:

  • modular, or
  • monolithic.

This last aspect determines how the format handles single-sourcing:

  • book-to-online help, or
  • online help-to-book.
Structured FormatManual Layout Possible?
FrameMakerNo
DocBookYes
DITA XMLYes

FrameMaker and DocBook are not fully modular because the smallest manipulable content elements are not generic—they include information like table-of-contents structure or cross-references valid only in limited contexts.

Source formats can rely on monolithic files or clusters of modular files.

Monolithic files (e.g., MS Word, LibreOffice, or FrameMaker) centralize all content in one file. This makes them easy to handle but limits content sharing and increases the risk of duplicate or inconsistent information.

Monolithic technical writing source format Monolithic technical writing source format

Clusters of modular files (e.g., DITA XML) aggregate content from multiple files, promoting content sharing and block reuse. While difficult to implement enterprise-wide, modular files should be standard for a technical writing team.

Modular technical writing source format Modular technical writing source format

Some word processors attempt to handle modular documents, but often poorly. Conversely, a DocBook or DITA XML document can be monolithic, but it loses flexibility.

A modern, widely used technical writing source format is Markdown. Markdown is a lightweight, human-readable text format that clearly separates content from layout.

For example:

  • Headings use # rather than visual styles.
  • Lists use - or * rather than graphical bullets.
  • Bold or italic text is marked with ** or _.

Markdown has several advantages for technical writing:

  1. Simplicity – easy to learn and use.
  2. Modularity – each Markdown file can be a self-contained information module.
  3. Interoperability – text files can be converted to HTML, PDF, DocBook, DITA, or even Word using tools like Pandoc.
  4. Traceability – as plain text, files work perfectly with version control systems like Git.

Markdown sits between traditional WYSIWYG word processors (easy to use, low modularity) and structured XML formats (high modularity, complex to manage).

The most famous modular system in the world is Lego. In technical writing, modules improve documentation quality and writer productivity.

Simply converting a FrameMaker document to DITA XML or Markdown does not guarantee a modular document. If the original content mixes concepts, procedures, and reference information, conversion may still violate the target format’s semantics.

If a document consists of files following different schemas (concept, task, reference), it may still be incomplete or inconsistent.

A true module is an atomic, self-contained unit of information that can be reused in multiple contexts. Dividing a monolithic document into many files is not sufficient; each file must be rewritten minimally to become a genuine module. Structural planning and proper cross-references are essential.

Modules are fully decontextualized, and structural information, such as cross-references, is stored in files separate from textual content.

Source formats are either binary or text.

  • Binary formats are opaque: opening them in a text editor shows unreadable characters. They usually require a specific program to edit.
  • Text formats are transparent: opening them in a text editor shows readable text and markup. They can be edited with multiple tools, batch processed from the command line, and manipulated with powerful regular expressions.

Markdown is a perfect example of a transparent, simple, modular, and traceable text format.