What is Office Open XML (OOXML)? 

Office Open XML is the name of the file format since Microsoft Office 2007. Until Office 2003, the document format of Office was a unique binary format. From Office 2007 onwards, the file format was newly stipulated in compliance with XML. By specifying the XML format, it has become easier to create Office documents with third-party applications, read them, and use them.

 

 

In addition to its own binary format, Microsoft Word had a text exchange format called RTF, which extended the text. Using RTF, Word documents could be read and written in many applications. The XML format, WordProcessing ML, was first introduced in Word 2003.

In Office 2007, the document formats for Word, Excel, and PowerPoint were defined in an XML format called Office Open XML. Documents created in Office consist of multiple XML and image files, which are compressed into one using ZIP compression. This compressed package is called a package.

 

An image example of how to decompress a zip file of Word docx

Example of decompressing ZIP of Word docx

 

Advantages of OOXML format


When Microsoft Office documents were in binary format, to read Office documents with other applications, one had to obtain the specifications from Microsoft and analyze the Office documents.

Since being standardized, it has become relatively easy to read the contents of Office documents and reuse them. Additionally, it has become easy to output files compatible with Office documents in other applications.

 

 

OSDC_banner

 

Applications that can use OOXML


The OOXML format is an international standard, and applications other than Microsoft Office can also utilize OOXML documents. The main applications are:

  • Antenna House's Office Server Document Converter (formerly Server Based Converter) is a formatting engine compatible with Microsoft Office. Convert OOXML documents to PDF and images without using Microsoft Office. It is used to turn OOXML into PDFs and images on servers such as Linux.
  • Libre Office can save and read in OOXML format. It is said that compatibility is not high. Open Office has the ability to read OOXML format. There is no function to save in OOXML format ( [1] ).
  • Apache's POI project aims to manipulate OOXML files on the server side ( [2] ).
  • PHPWord reads and writes OOXML Word format from PHP ( [3] ).
  • oXygen XML Editor lets you edit OOXML documents as-is. The following figure shows the docx file opened with oXygen XML Editor and the document body (document.xml) opened in the editing screen.

    A screenshot of Oxygen XML Editor's user interface

Screen for editing Word docx

 

See "Office Open XML and the ECMA-376 Specification" ( [4] ) for more.

Relationship between the Office Open XML (OOXML) specification and versions of Microsoft Office



Microsoft Office 2007's OOXML format was published as ECMA-376 1st edition in December 2006. Then, in December 2008, the 2nd edition of ECMA-376 was published, which became the ISO/IEC 29500:2008 specification based on the 2nd edition.

ECMA-376 and ISO/IEC 29500 are revised in parallel with the version upgrades of Office. The document format of the latest Office 2016 is ECMA-376 5th edition, ISO/IEC 29500:2016. ECMA-376 is freely available, while the ISO/IEC 29500 document is available with purchase.

Correspondence between ECMA-376 and ISO 29500

Version Number
Release
ECMA
ISO/IEC
Office 2007
January 30,2007
ECMA-376 Edition 1
 
Office 2008 (mac version)
January 30, 2007
ECMA-376 Edition 2
ISO/IEC 29500:2008
Office 2010
June 17, 2010
ECMA-376 Edition 3
ISO/IEC 29500:2011
Office 2013
February 7, 2013
ECMA-376 Edition 4
ISO/IEC 29500:2012
Office 2016
September 23, 2015
ECMA-376 Edition 5
ISO/IEC 29500-1:2016, 
ISO/IEC 29500-3:2015,
ISO/IEC 29500-4:2016

 

It is divided into ISO/IEC 29500 specifications Part 1 to Part 4. Looking at the list of ISO/IEC 29500 specifications in the ISO catalog ( [5] ), the latest revision year differs depending on the part. Part 2 (ISO/IEC 29500-2) is the latest with the 2012 edition.

  • ISO/IEC 29500-1:2016
    Information technology — Document description and processing languages ​​— Office Open XML

    File Formats — Part 1: Fundamentals and Markup Language Reference
    (2008 and 2012 editions withdrawn)
  • ISO/IEC 29500-2:2012
    Information technology — Document description and processing languages ​​— Office Open XML

    File Formats — Part 2: Open Packaging Conventions
    (2008 version withdrawn)
  • ISO/IEC 29500-3:2015
    Information technology — Document description and processing languages ​​— Office Open XML

    File Formats — Part 3: Markup Compatibility and Extensibility
    (2012 version withdrawn)
  • ISO/IEC 29500-4:2008/Cor1:2010
  • ISO/IEC 29500-4:2016
    Information technology — Document description and processing languages ​​— Office Open XML

    File Formats — Part 4: Transitional Migration Features

 

Compatibility of Microsoft Office and OOXML


In Word 2013 and later, you can choose between two docx formats from the Save As menu: Word Document (docx) and Strict Open XML Document (docx). This is a difference in how ISO/IEC 29500 is compliant.

ISO/IEC 29500-4 (Part 4), allows for the Transitional version of OOXML, which is almost the same as Edition 1 of ECMA-376 and allows for the use of older Office document file formats such as VML.

SO/IEC 29500 compliance in its entirety is the Strict type that complies with Parts 1-3, and Transitional type that also uses Part 4. Office 2007 reads and writes to Edition 1 of ECMA-376. Office 2010 can read the Strict type but cannot write it.

Office 2013 is the first version that can read and write the Strict type, meaning it is the first version to fully comply with ISO/IEC 29500.

 

Reference material