Convert docx file to HTML


Our proprietary analyzing program analayzes docx files created in Word and converts them into HTML5 or XHTML 1.0 compliant HTML, which is much simpler and free of extra tags than the standard Word HTML format.

 

func01-01-top

 

 

func01-01-01

 

HTML source code output from Word's standard features

For standard Word functions

Word has a standard function to convert a document to HTML when it is saved, but in order to ensure that the appearance of layout and style can be reproduced and re-editable in Word, a large number of layout and style specifications are given as "style" directly to tags for text, images, etc. This generally makes them unsuitable as HTML to be published on the Web, or makes it difficult to customize or modify the HTML.

In some cases, the output lacks the right output for the HTML structure, so although the appearance in the Web browser reproduces the layout in Word to some extent, it uses tags that do not match the HTML structure. 

 

 

func01-01-02

HTML source code output from HTML on Word

Convert with HTML on Word

"HTML on Word" analyzes the contents of the docx file, minimizes information related to layout and style, and converts the structure of the text added in the Word document so that it is appropriate for the HTML structure. Since there are no extra layout or style specifications, HTML is generated and is simple and easy to customize or modify.
Layout and style can be specified separately using CSS, making it easy to structure a web page separately from HTML structure and design.

 

 

Convert Word's style to HTML tags


Styles, paragraphs, etc. specified in Word are analyzed and converted into equivalent HTML tags for output.
The table below lists some of the tags to be converted. For detailed conversion specifications, please refer to "Conversion Specifications" in the online manual.

 

Main converted styles and tags

Word's style

html tag

Body text

<body>-</body>

Heading 1 to 6 (Outline level 1 to 6)

<h1>-<h6>

Note: For HTML5, output <section> for each heading.

Heading 7 to 9 (Outline level 7 to 9)

<p class=”l7”>-<p class=”l9”>

Paragraph (normal)

<p>-</p>

Bullets

<ul><li>-</li></ul>

Paragraphs with numbering

<ol><li>-</li></ol>

Image

<img src="Path of output image">

Table

<table><tbody><tr><td>-</td></tr></tbody></table>

Table style option: Title row

<thead><tr><td>-</td></tr></thead>

Table style option: First column

<tr><th>-</th><td>-</td>-</tr>

Table cells

<td>-<td>

Hyperlink

<a href="URL">-</a>

 

Other HTML elements and tags to be converted/outputted

  • Output HTML version: HTML5 or XHTML 1.0

  • Header/Meta information: head, title, meta, link, style, script

  • Paragraph numbering and ordered lists: Normal paragraph or ol class=Numbering type", li

  • Paragraph style name (optional): class="Style name"

  • Image and shape formats: JPEG or PNG for images, SVG for line art

  • Layout options: Specify layout options by class

  • Position to output the figure with “With Text Wrapping” specified: Behind the anchored block

  • Formula: SVG by default, optional MathML or OMath output

  • Inline elements: strong, sub, sup, ruby, rp, rt, with optional italic, underline, and strikethrough output

  • Text color: Optionally, style color

  • Links and cross-references: External links, cross-references, links from auto-generated ToC to main text headings

  • Paragraph text alignment: class attribute

  • Endnote: Anchor to endnote symbol and link to footnote output at the end of the document


    Etc.

 

Convert Word's ToC to Web page ToC


The "Table of Contents" that can be automatically created in Word is converted into text links that can be used like a table of contents on a web page.
Text links generated for each heading (outline level) make it easy to navigate to the desired heading.

 

func01-03

 

 NEW!  Enhanced table of contents conversion

Number of enhancements have been made to the table of contents to make it easier to layout and more convenient to use.

  • The entire table of contents section is now output as a <nav class="toc-wrap"> tag in HTML. (<div class="nav-area"> tag in xhtml.)

  • To enable the loading of the table of contents in a separate file, enclose the interior of the tag above with a <div id="toc"> tag.

  • The class attribute of the heading paragraph of the table of contents now outputs "toc-heading" *1.

  • "toc-[n]" *1 ([n] is the value of the table of contents level 1-6) is now output for the class attribute of the paragraph for each item in the table of contents.

  • When HTML is output into separate HTML, a table of contents is output to all of the split HTML files. At this time, "active" is output as the class attribute of the paragraph <p> tag of the table of contents item (the highest hierarchical level in the page) that indicates the own HTML file.

  • When HTML is output into separate HTML, the table of contents can be output as a separate HMTL file (toc-inc.html) by specifying the option. *2

  • Tag output so that buttons for showing/hiding the table of contents can be installed when displayed on mobile devices.

    Note: javascript and CSS are required to install and operate the buttons. Please use the sample with the buttons installed.

    Samples

*1 This value is the default when inserted and unedited by the "Built-In" Word table of contents feature.

*2 Only the inside of the <nav> tag is output as a separate HTML file for loading with JavaScript. Tags such as <html>, <head>, and <body> are not output.

 

 NEW! 

Split HTML output


HTML can now be output by splitting a Word document into chapters, sections, and other specified outline level units.

By specifying the "-split" option followed by the desired outline level (1 to 3) when executing from the command line, the document will be split at the heading style and paragraph points of the specified outline level in the Word document and output as an HTML file for each outline level.

func01-split01_en

By splitting pages, even long documents can be made minimized and easy-to-read Web pages because the amount of scrolling per page can be reduced and the file size to be read at one time can be kept to a minimum.

At this time, if there is a table of contents inserted by the Word table of contents function, the table of contents and its link will be output to all HTML files.
The table of contents can also be output as a separate HTML file by specifying an option. In this case, each HTML file split by outline levels will not output the table of contents. The output HTML file of the table of contents can be loaded into each HTML file using JavaScript, or used to create a page for the table of contents.

Please refer to the sample that reads the HTML file of the table of contents.

Samples

 

 NEW!  Page navigation can also be output

pr-new-img02_en

 

When outputting split HTML, the "-pagenavi" option can be used to output "Prev/Next" links that allow the user to move through the split HTML pages in order.

Links are output at the top and bottom of the body text. The output link can be in Japanese or English.

 

Parameter / Value

Output

-pagenavi ja

"前へ" "次へ"

-pagenavi [Other than ja, or no value]

"Prev" "Next"

Note: If there is no corresponding page on the previous or next page, such as the first or last page, the corresponding link is not output.