|
Professional Formatting Solutions | |||||||||||
|
||||||||||||
WordMLToFO Stylesheet V2.1What is WordMLToFO Stylesheet?WordMLToFO Stylesheet is the XSLT stylesheet that transforms the new Microsoft Word2003 file format called "WordML" into the FO XML file which is compliant with W3C recommendation "Extensible Stylesheet Language (XSL)". The "Extensible Stylesheet Language (XSL)" is the specification to layout and format the XML documents. Using this stylesheet WordMLToFO, you can generate FO files from Word documents for any purpose.
Operation by another XSL-FO formatting application is not guaranteed.
|
|
WordML is formally called "Wordprocessor Markup Language". Until now Microsoft Word has the main native file format called binary format (.doc extension) or Rich Text format (.rtf extension). WordML has the XML file format and called fully compatible with these native file formats. In addition, WordML has the following features.
You can build following XML applications based on WordML features:
WordMLToFO stylesheet is one of the applications of the latter case. About WordML, Microsoft c has released its specification on Nov, 2003. If you are interested in WordML, you can download specification from the following URL:
Office 2003 XML Reference Schemas includes the document titled "Overview of WordprocessingML" that simply explains WordML structure and examples.
WordMLToFO style sheet generates the FO file based on the following functions.
WordMLToFO style sheet maps WordML elements to the XSL-FO elements in the following way.
| Document element | WordML Element | XSL-FO Element |
|---|---|---|
| Paragraph | w:p | fo:block |
| Inline (text-run) | w:r | fo:inline |
| Bullet and numbering | w:p (paragraph that has w:pPr/w:listPr) | fo:list-block, fo:list-item, fo:list-item-label, fo:list-item-body |
| Table | w:tbl, w:tr, w:tc | fo:table, fo:table-row, fo:table-cell |
| Image | w:pict | fo:external-graphic |
Word document contains many styles and the styles are applied paragraph or text-run or table, finally they are formatted according to the applied stylesheet result. The style contains table-style, paragraph-style and character-style. In contrast, XSL-FO does not have style concept. All of the formatting property must be described as the last result in the FO file after applying the styles. As a result WordMLToFO stylesheet must apply following style, and then output the last result to the FO file.
| Document Element | Condition | Applied Stylesheet |
|---|---|---|
| Paragraph | Paragraph inside the table | Table style, Paragraph style |
| Paragraph outside the table | Paragraph style | |
| Inline (text-run) | Inline in the paragraph inside the table | Table style, Paragraph style, Character style |
| Inline in the paragraph outside the table | Paragraph style, Character style | |
| Row or Cell in the Table | - | Table style |
WordMLToFO is the stylesheet based on XSLT 1.0 W3C recommendation. It uses some extension function about RTF (Result Tree Fragment). At present we have tested under the following XSLT processors.
| XSLT Processor | Notes |
|---|---|
| Saxon 6.5.3 | Tested using Sun Java SDK, Java 2 Platform, Standard Edition 1.4.1 or higher. Saxon7 is not tested yet. Instant Saxon is not supported. |
| MSXML3, MSXML4 | Line layout calculation is simplified. |
| .NET | To excuse it, The EXSLT.NET Library is necessary. Therefore, you should add the ExsltTransform class to calling program. Please refer to Building Practical Solutions with EXSLT.NET in detail. |
Current implementation has the following limitations.
Since the document model is different fundamentally between Microsoft Word and XSL-FO, perfect conversion cannot be performed. Therefore, the formatted result may not be outputted correctlly.
Word supports many types of field. WordMLToFO Stylesheet transforms fields using its "result text". Many fields have the elements corresponding to the result text, but there exists exceptional pattern. For instance, WordMLToFO Stylesheet cannot offer the text result from special types of field such as list-box.
When you use Word, tab character is useful for positioning text in the line and it is widely used in creating documents. In contrast XSL-FO has no corresponding functions about tab character. WordMLToFO Stylesheet transforms tab character (w:tab) into XSL-FO fo:leader object. But the original form cannot be reproduced.
Auto Shape is used to draw graphics in Word document. Current WordMLToFO Stylesheet implementation does not support Auto Shape.
Current WordMLToFO Stylesheet implementation does not support footnote/endnote.
The line height might not be correctly set.
A word in the Word document with hyphenation setting is divided as follows in WordML:
<w:t>Fo</w:t> <w:t>r</w:t> <w:t>matter</w:t>
For that reason, the word is also divided in the transformed FO. As the result, the word cannot be hyphenated.
Following is the result using MSXML4 as XSLT processor. Click image to get large one. (The document content is a fiction.)
Word view of the original Word document |
Format the result FO using XSL Formatter |
You can download sample data from here.
The WordMLToFO Stylesheet (except for the source code) is built in V4.1. Please download the evaluation version 4.1 and confirm the formatted result of your own document.
| Name | WordMLToFO Stylesheet |
|---|---|
| Price | $200 |
| Contents | Stylesheet source file, external Java library (.jar file) User's manual, Sample Data |
| License |
Customer may install and use one copy of the product on a single computer. |
If you are interested in WordMLToFO Stylesheet, please feel free to contact us via E-mail.