How are images embedded in Word documents handled?

Table of Contents
  1. Saving Images embedded in Word documents
  2. Handling image resolution
  3. Resolution handling when converting docx to PDF with OSDC

When an image is embedded in a PDF, it may not be noticeable when viewed on a screen or webpage, even if the resolution is low. However, if the resolution is low, the printed image will appear blurry.(*)

If you edit the original PDF using Microsoft Word, the image may be reduced depending on the editing operation and Word's settings. This can result in blurry images, even if the original image had high resolution. Therefore, it is important to be careful when creating PDFs for printing.

In our investigation, we are exploring how images are handled when embedded in Word documents and when creating PDFs from Docx files using Office Server Document Converter (OSDC).

(*) Please note that "resolution" can refer to both the number of pixels (px) in the image and the number of pixels per inch (ppi).

 

OSDC_banner

 

Saving images embedded in Word documents


This text provides an explanation of survey results for two different types of image formats: PNG and JPEG. The survey environment is as follows:

  • Word version: 2013 (32-bit)
  • Operating environment: Windows10 (64-bit)

The file format of Word documents (docx format) is defined by Office Open XML (OOXML) (For an overview of OOXML, please refer to: What is Office Open XML (OOXML)?).

After creating a docx format file containing PNG and JPEG, decompressing the OOXML, and examining the contents of the folder, it looks like the following figure.

Example of image in Word docx file

figure 1.1 Image in docx

The text of a Word document is stored in the document.xml file. Upon examination of the document.xml file, it becomes clear that the specifications for image placement and size within a Word document are described in either the Vector Markup Language (VML) shape format (old format) or the OOXML shape format (new format).

 

When an image is dragged and dropped into the editing screen of Word, it is saved as the old format. However, if an image is copied and pasted into the editing screen, or if the "Image" option in the "Insert" menu of Word is used to insert an image into the document being edited, it will be saved as the new format.

 

*In Word 2019, images will be saved in the OOXML shape format (new format) in either case.

 

 

Handling image resolution


Resolution handling of images embedded in Word documents depends on Word's editing settings. This editing setting differs depending on how the image is saved.

  • In the new format

    Word's "File" and "Options" have settings for handling images in files (figure below). If you don't want Word to automatically change the resolution of your images, you need to check "Don't compress images in file". Note that this option works the same in Word 2016.

Screenshot of advanced Word options

figure 1.2 New format file option

  1. "Do not compress images in files" is unchecked (default)

    When the displayed size of the image is changed, the resolution of the image itself (the number of vertical and horizontal pixels) is changed. The image resolution (number of pixels) depends on the display size of the image and the option resolution (ppi) setting. Only compressed images are saved in docx files saved by Word, so they cannot be restored.

  2. Checked "Do not compress images in file

    Even if the display size of the image image is changed, the resolution of the image itself (the number of vertical and horizontal pixels) does not change. The original image image resolution remains intact.

  • In the old format

    According to the "Figure Compression" setting on the "Figure Tools" and "Format" tabs displayed by selecting a figure. By default, the image resolution (number of vertical and horizontal pixels) is set so that it does not change. In this case, even if the display size of the drawing is changed, the resolution of the image does not change. For the old format, "File" "Options" settings for handling images in files have no effect.

Resolution handling when converting docx documents to PDF with OSDC


When creating PDF from docx with OSDC, it is possible to decompress images in PNG or JPEG format once and recompress them.

At that time, the compression method can be specified for each color image, grayscale image, and monochrome image, and the downsampling method can be specified. If you specify compression or downsampling, the resolution of the images in the docx file will change according to the settings. See the OSDC command line interface parameter settings for details.

If there is no parameter setting, images in PNG or JPEG format are embedded in the PDF as they are by pass-through (embedding the binary data of the image as is in the PDF without decompressing or recompressing)

 
Screenshot 2024-03-20 162800