Hyphenation

AH Formatter V6.5 can hyphenate over 40 languages. There is no need to prepare the dictionary.

Languages

AH Formatter V6.5 supports the hyphenation for the following languages.

Code Language Hyphenation Limited To
af afr Afrikaans Latin characters and Apostrophe
bg bul Bulgarian Cyrillic characters
ca cat Catalan Latin characters and Apostrophe and Decimal point (Full stop or Middle dot)
cs ces Czech Latin characters
cy cym Welsh Latin characters and Apostrophe
da dan Danish Latin characters and Apostrophe
de deu German / Swiss German Latin characters and Apostrophe
el ell Greek Greek characters
en eng English Latin characters and Apostrophe
en-US eng-US American Latin characters and Apostrophe
eo epo Esperanto Latin characters
es spa Spanish Latin characters
et est Estonian Latin characters
eu eus Basque Latin characters
fi fin Finnish Latin characters
fr fra French / Canadian French Latin characters and Apostrophe
ga gle Irish (Erse or Gaelic) Latin characters and Apostrophe
hr hrv Croatian Cyrillic characters or Latin characters
hu hun Hungarian Latin characters
id ind Indonesian Latin characters and Apostrophe and Digit 2
is isl Icelandic Latin characters
it ita Italian Latin characters and Apostrophe
la lat Latin Latin characters
lt lit Lithuanian Latin characters
lv lav Latvian Latin characters
ms msa Bahasa Malay Latin characters and Apostrophe and Digit 2
mt mlt Maltese Latin characters and Apostrophe
nb nob Norwegian (Bokmål) Latin characters and Apostrophe
nl nld Dutch / Flemish Latin characters and Apostrophe
nn nno Norwegian (Nynorsk) Latin characters and Apostrophe
no nor Norwegian Latin characters and Apostrophe
pl pol Polish Latin characters
pt por Portuguese / Brazilian Latin characters
ro ron Romanian / Moldavian Latin characters and Apostrophe
ru rus Russian Cyrillic characters
sk slk Slovak Latin characters and Apostrophe
sl slv Slovenian Latin characters and Apostrophe
sr srp Serbian Cyrillic characters or Latin characters
sv swe Swedish Latin characters and Apostrophe
sw swa Swahili Latin characters and Apostrophe
th tha Thai Thai characters
tr tur Turkish Latin characters
uk ukr Ukrainian Cyrillic characters

AH Formatter V6.5 hyphenates a word considering the character string composed of characters listed in the table above to be a word. If a word contains the other characters, it is not considered to be a word. If you need hyphenation for unsupported characters you will need to use a TeX dictionary.

Example

To use Czech hyphenation the following is placed in the fo file:

<fo:block hyphenate="true" language="ces"> Všichni lidé rodí se svobodní a sobě rovní co do důstojnosti a práv. Jsou nadáni rozumem a svědomím a mají spolu jednat v duchu bratrství. </fo:block>

Exception Dictionary

It's not necessary to prepare the dictionary with AH Formatter V6.5. However, there may be a case that you want to treat the unexpected hyphened words as exceptions. In such case, it is possible to register the words in the exception dictionary. In addition, when you edit the exception dictionary while working on GUI, you can re-load the hyphenation dictionary and re-format the document from [menu] - [Format] - [Reload Hyphenation Dictionary].

The exception dictionary is stored in the hyphenation folder in the AH Formatter V6.5 installation folder or in the folder where the AHF65_HYPDIC_PATH (AHF65_64_HYPDIC_PATH for 64bit version) environment variable indicates. The name of the dictionary file conforms to the following rules, which are the same as TeX dictionary.

For example: de.xml, en_US.xml and so on. When xml:lang="nl-BE" is specified, dictionaries are detected in the following order.

  1. nl-BE.xml
  2. nl_BE.xml
  3. nl.xml

The following shows the content of exception dictionary.

ElementLocationDescription
<hyphenation-info>root element
<hyphen-char>child of <hyphenation-info> The element that indicates the hyphenation character alternative to <hyphen/> in the <exception> element. Hyphenation character is expressed by the value attribute. The initial value is "-" (U+002D).
<exceptions>child of <hyphenation-info> A data of exception dictionary. The text of the <exception> element is a collection of hyphened words divided by white space. The hyphen information is indicated by the <hyphen> element, however the character specified by the <hyphen-char> element can also be used.
<hyphen>child of <exceptions> A full functional hyphen equivalent to TeX discretionary. <hyphen> element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. <hyphen> element is used when the spelling changes when a hyphenation break occurs.
<non-eol-words>child of <hyphenation-info> Specifies non-end-of-line words dividing by the white space. The word specified here is adjusted not to be placed at the end of line, however in some case it's inevitable. The non-end-of-line process is effective all the time, independent of the hyphenate property in FO.

The DTD of Exception Dictionary is simple as follows:

<!ELEMENT hyphenation-info (hyphen-char?, exceptions?, non-eol-words?)>

<!ELEMENT hyphen-char EMPTY>
<!ATTLIST hyphen-char value CDATA #REQUIRED>


<!ELEMENT exceptions (#PCDATA|hyphen)*>

<!ELEMENT hyphen EMPTY>
<!ATTLIST hyphen pre  CDATA #IMPLIED>
<!ATTLIST hyphen no   CDATA #IMPLIED>
<!ATTLIST hyphen post CDATA #IMPLIED>

<!ELEMENT non-eol-words #PCDATA>

Suppose the following exception dictionary is prepared.

<hyphenation-info>
<exceptions>
ta-ble
present
ba<hyphen pre="k" no="c"/>ken
</exceptions>
</hyphenation-info>

The word table has a possibility of being hyphened only as ta-ble, the word present never be hyphened. The word backen is hyphened as bak-ken. And ta<hyphen/>ble is quite equivalent for ta-ble in this example.

Possible to specify the hyphenation by the <hyphen> element that change the spelling of the word.

Settings for Exception DictionaryWordHyphenation
ab<hyphen/>defabdefab-def
ab<hyphen no="c"/>defabcdefab-def
ab<hyphen pre="x"/>defabdefabx-def
ab<hyphen pre="x" no="c"/>defabcdefabx-def
ab<hyphen post="z"/>defabdefab-zdef
ab<hyphen no="c" post="z"/>defabcdefab-zdef
ab<hyphen pre="x" post="z"/>defabdefabx-zdef
ab<hyphen pre="x" no="c" post="z"/>defabcdefabx-zdef

The exception dictionary is available with the following languages:

Code Language Hyphenation Limited To
km khm Khmer V6.5 Khmer characters
lo lao Lao V6.5 Lao characters
my mya Burmese (Myanmar) V6.5 Burmese characters
th tha Thai Thai characters

With these languages, the exception dictionary is not used for the hyphenation but for specifying the words that are prohibited to break. The word can contain only the word constituent character for each. It's not available to specify the hyphenated word or <hyphen> in <exceptions>

TeX Dictionary

It's also available to do hyphenate using the TeX dictionary with AH Formatter V6.5. To hyphenate by Tex dictionary, it's necessary to specify HyphenationOption="false" in the Option Setting File. Dictionaries will be required for all the necessary languages. Dictionaries are XML files that are the same format as FOP. See also the Apache Website. Only the hyphenation dictionary for English (en.xml) is ready and provided with XSL Formatter V4.0.

When you'd like to hyphenate words by TeX dictionary only with a certain language, please specify a language to hyphenation-TeX in the Option Setting File.

See also Exception Dictionary to learn the name and the position of TeX dictionary.

The contents of TeX's Hyphenation Dictionary are defined in the hyphenation.dtd. hyphenation.dtd is included in FOP distribution. In AH Formatter V6.5, it is installed in the hyphenation folder where AH Formatter V6.5 is installed. Below is a brief explanation of the DTD. Refer to hyphenation.dtd for more details.

ElementLocationDescription
<hyphenation-info>root element
<hyphen-char>child of <hyphenation-info> This element expresses hyphenation characters in the exception dictionary data. Hyphenation character is expressed by the value attribute. Initial value is "-" (U+002D). But the hyphenation characters in the actual formatted result are given by the hyphenation-character property in the XSL specification.
<hyphen-min>child of <hyphenation-info> When hyphenation break occurs, before and after attributes give the minimum number of characters in a hyphenated word before or after the hyphenation character. before attribute is mapped to XSL hyphenation-remain-character-count property, after is mapped to XSL hyphenation-push-character-count. AH Formatter V6.5 uses these properties and the hyphen-min element in the dictionary is ignored.
<classes>child of <hyphenation-info> Defined as character equivalent class. Text of classes' element is white space-separated list of character groups, all characters in a group are to be treated equivalent. Actually each group consists of lowercase and uppercase characters. Following is a sample of English dictionary (en.xml).
aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR sS tT uU vV wW xX yY zZ
<pattern>child of <hyphenation-info> The hyphenation patterns, space separated. A pattern consists of character and digits. Character is the beginning characters of classes groups (normally lowercase). Digits between characters indicate the strength of hyphenation potential (hyphenation value).
<exceptions>child of <hyphenation-info> Data of hyphenation exception dictionary. Text of exceptions element consists of space-separated list of hyphenated words. A hyphen is indicated by the hyphen element, but you can use character defined in hyphen-char element. Exceptions element is used when hyphenation points determined by hyphenation-pattern dictionary are not appropriate or you want to use special hyphenation patterns of your own.
<hyphen>child of <exceptions> A full functional hyphen. Hyphen element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. Hyphen element is used when the spelling changes when a hyphenation break occurs.

Restrictions

If the sentence is placed in the narrow region and there occurs plural hyphenation for one word, sometimes the result does not follow the exception dictionary. See also Hyphenation in Technical Notes.