I18N Index Library v2

Antenna House I18N Index Library is a Java library that makes index pages in the various languages used by DocBook, DITA to XSL-FO stylesheet.

What can I18n Index Library do?

I18n Index Library accepts indexterm data from XSLT stylesheet and sorts them according to the document language. The sorted results are converted into appropriate XSL-FO format by stylesheet.

I18N-Index-Library

  • The I18N Support Library is the LGPL open-source license library developed by Innodata Isogen, Inc. Antenna House made some modifications to the original library and opened this library under the LGPL license according to the LGPL rule. This library is included in this release.
  • The I18N Index Library is an interface library between the stylesheet and the I18N Support Library developed by Antenna House. It provides the sorting capability for the supported language indexes.
  • The DocBook to FO stylesheet is a sample stylesheet developed by Antenna House. This stylesheet passes index data to the library and makes the formatting objects for indexing from the sorting results. This release contains XSLT 1.0 and XSLT 2.0 stylesheets for convenience. You can modify the stylesheets according to your PDF output requirements.
  • The XSL Formatter outputs PDF from XSL-FO and implements XSL 1.1 indexing features. Antenna House XSL Formatter V6 is recommended.

DocBook to FO stylesheet

– Implements many DocBook indexterm features using XSL1.1 index functions

  • Three-level nested index structure (primary, secondary and tertiary elements).
  • Range index (startofrange, endofrange attribute).
  • Significant index (significance attribute).
  • Supports “See”, “See Also” (see, also elements).

– Supports major Java based XSLT processors.

  • XSLT 1.0 processor: Saxon 6.5.5, Xalan-J 2.7.1/2.7.2
  • XSLT 2.0 processor: Saxon-B 9.1, Saxon-PE/EE 9.2 or later

– Includes sample XSLT 1.0/2.0 based stylesheets.

  • XSLT 1.0 stylesheets for Saxon 6.5.5, Xalan-J 2.7.1/2.7.2
  • XSLT 2.0 stylesheets for Saxon-B 9.1, Saxon-PE/EE 9.2 or later

– You can use the sortas attribute to correct Simplified Chinese index orders.

For example the Chinese word “粘贴” belongs to “N” index group because the most common reading of “” is “nian2”. However the correct reading is “zhan1” for this word.

<indexterm><primary>粘贴</primary></indexterm>

You can correct this problem by specifying the correct reading (pinyin) to the sortas attribute value. The fix will place “粘贴” into the “Z” index group.

<indexterm><primary sortas="zhan1 tie1">粘贴</primary></indexterm>

 

PDF5-ML plug-in

– Supports DITA indexterm features

  • Multiple level nested <indexterm> elements.
  • Multiple <index-see>,<index-see-also> elements.
  • <index-sort-as> element.
  • Range index (indexterm/@start, @end attribute).

– Newly created XSLT2.0 stylesheets for DITA to XSL-FO transformation

  • Supports many DITA 1.2 elements and attributes. For instance, in addition to <indexlist> element, <figurelist>, <tablelist> elements have been implemented.
  • All of the style are defined in the external file called style definition file in the “config” directory. This style definition file is created for each language-code. Default style definition file, English and CJK files are bundled in this plugin. You can change the manual style only editing this style definition file without editing stylesheet file.
  • If you want to customize the stylesheet algorithms, add the customization stylesheets in the “customize” folder and include it in dita2fo_custom.xsl. You can use full power of XSLT 2.0 features for DITA to FO transformation.
  • If indexterm/@start has no corresponding indexterm/@end element, this stylesheet automatically close the indexterm range according to the DITA specification. This process is done at topicref/topicmeta, topic/metadata and body-level range indexterm elements.

 NOTE:PDF5-ML plug-in is independent from this I18n Index Library release. It can be downloaded from GitHub https://github.com/AntennaHouse/pdf5-ml

– Independent Java library plug-in

  • I18n Index Library plug-in can be shared from another plug-ins in DITA Open Toolkit . For instance it can be used from PDF, HTML, EPUB plug-ins if they need index sorting function
  • Installation is very easy. You can copy I18n Index Library plug-in folder into [DITA-OT]\plugins folder and integrate it using ant command-line.

– Processing diagram for DITA Open Toolkit

dita_workflow

  • The DITA Open Toolkit (DITA-OT) is an open-source publishing system for publishing XML instances written in DITA.
  • PDF5 is a sample DITA-OT plug-in developed by Antenna House. It implements most of DITA 1.2 index features and outputs PDF using Antenna House Formatter.
  • I18n Index Library plug-in is an independent DITA-OT plug-in that enables multiple DITA-OT plug-ins to use the I18n Index Library feature.

– Supports 50 language indexes.

Arabic, Bulgarian, Catalan, Czech, Danish, German, Greek, English, Spanish, Estonian, Persian (Farsi), Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Icelandic, Japanese, Kazakh, Khmer, Kannada, Korean, Lao, Lithuanian, Latvian, Malay, Burmese (Myanmar), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Sinhala, Slovak, Slovenian, Swedish, Swahili, Tamil, Teglu, Thai, Tagalog, Turkish, Ukrainian, Vietnamese, Simplified Chinese, Traditional Chinese

– Supports derivatives for language codes.

  • For instance you can define one index configuration for pt, pt-BR, pt-PT.
  • These language codes can be written in index configuration file botb_index_rules.xml as follows.
    <index_config>
    <national_language>pt</national_language>
    <national_language>pt-PT</national_language>
    <national_language>pt-BR</national_language>
    <description>
    <p>Portuguese index configuration</p>
    </description>
    ...
    <index_config>

NOTE: The corresponding language codes for supported languages are ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, he, hi, hr, hu, id, it, is, ja, kk, km, kn, ko, lo, lt, lv, ms, my, nl, no, pl, pt, ro, ru, si, sk, sl, sv, sw, ta, te, th, tl, tr, uk, vi, zh-CN, zh-TW

General element sorting function

  • Sorts any kind of elements
  • I18n Index Library has one more class named jp.co.antenna.ah_i18n_general sort. You can use static method of this class for sorting any kind of elements.
  • The sort key should be supplied by @sort-key and @sort-as attribute of the target element.

– Differences between <xsl:sort> and I18n Index Library

  • By calling static method of jp.co.antenna.ah_i18n_general sort class you can group sorting results by @group-key attribute that library returns.
  • Language specific grouping cannot be done with standard <xsl:sort>.
Item Contents
Java Runtime Environment Oracle JRE 7 or later
XSLT Processor Saxon 6.5.5, Xalan-J 2.7.1/2.7.2, Saxon-B 9.1, Saxon-PE/EE 9.2 or later
XML Parser Java bundled parser
DITA Open Toolkit 1.7.5 or later
XSL Formatter Antenna House Formatter V6

 NOTE:

  • Saxon-HE 9.2 is not supported because they do not allow external Java library call from XSLT stylesheet.
  • DITA Open Toolkit 1.7.5 or later bundles Saxon-B 9.1 as XSLT Processor.
TYPE OF LICENSE PRICE
I18n Index Library V2.3 $5,000
Annual Maintenance $1,000

* This library supports Unicode characters only in the BMP (Basic Multilingual Plane) for the sorting index. The Hanzi or other characters that are outside the BMP are not supported.
* I18n Support Library contains the index definition for some languages other than those indicated in this document. These additional languages are not officially tested or supported by Antenna House.
* This release contains sample batch files which only operate in a Windows environment.
* The DocBook zone and the startref attributes of the indexterm element are not supported.
* I18n Index Library V2.3 is tested via ICU4J 5.6 release (icu4j-56_1.jar). Refer to ICU site for details.
* Hanzi collation is made from Unicode 6.0 Unihan databases. Refer to Unihan Database for details.
* The sorting architecture between PDF5-ML plug-in and PDF2 plug-in bundled with DITA-OT is significantly different. The operation integrating I18n Index Library into PDF2 plug-in is not tested nor officially supported.