AH Formatter / XSL Formatter
Q&A



Operation

Multilanguages

Top
Basic, General
About AH Formatter
About XSL Formatter
Operation
Programming
XSLT/XSL-FO/SVG/MathML Conformance
XSLT/XSL-FO/MathML Technics
Index
Q.  When I format the document with mixture of English (writing mode is from left to right) and Arabic (writing mode is from right to left), the parentheses enclosing English text is not displayed correctly. Is it a bug? [No.2002102508]
A. 

It's not a bug. It follows the Unicode spec.

The directionality of Unicode text is decided by bidi algorithm defined in the Unicode specification. Latin Alphabet goes from left to right, Arabic Alphabet and Hebrew Alphabet go from right to left.

The enclosing parentheses are neutral characters that are available for both writing mode in left-to-right and right-to-left. The glyph direction changes reversibly when writing directions changes from 'right-to-left' to 'left-to-right'. It's so called 'Mirroring'.

In other words, as the enclosing parentheses are neutral characters and also characters acceptable for 'Mirroring', left parenthesis '(' becomes right parenthesis ')' and right parenthesis ')' becomes left parenthesis '('.

In the following example, suppose 'Arabic' stands for an Arabic word and 'English1', 'English2' stand for English words, and English2 is enclosed by parentheses, appearing order is as follows:

<fo:block-container writing-mode="rl-tb" language="ar">
<fo:block>
Arabic English1 (English2)
</fo:block>
</fo:block-container>

When displaying in writing mode 'right-to left', it's shown as follows:

(English1 (English2 cibarA

As the Unicode Spec says that ' ... Generally, neutrals take on the direction of the surrounding text.', the direction of the neutral between the left-to-right text and the left-to-right text becomes left-to-right. So, the left parenthesis between 'English1' and 'English2' becomes left-to-right mode.

However, as the Unicode Spec. says that 'In case of a conflict, they take on the embedding direction.', the right parenthesis placed after 'English2' becomes right-to-left mode because writing-mode="rl-tb" is specified to fo:block-container. Then, the right parenthesis becomes left parenthesis by the process of mirroring and placed at the last of line (left side) by writing-mode="rl-tb".

There are two ways considerable to avoid this. First, please use the Unicode control character U+200E (LRM, Left-to-Right MARK). LRM is a zero-width character with directionality. If you add a character with left-to-right mode on the right side of '(English2)', the right parenthesis also becomes left-to-right mode.

<fo:block-container writing-mode="rl-tb" language="ar">
<fo:block>
Arabic English1 (English2) &#x200E;
</fo:block>
</fo:block-container>

This character is also effective with IE. In Notepad installed on Windows 2000/XP, LRM can be entered in [Insert Unicode control character] in right click menu.

Second, please use fo:bidi-override. Enclose '(English2)' with <fo:bidi-override unicode-bidi="embed" direction="ltr">

<fo:block-container writing-mode="rl-tb" language="ar">
<fo:block>
Arabic English1
<fo:bidi-override unicode-bidi="embed" direction="ltr">
(English2)
</fo:bidi-override>
</fo:block>
</fo:block-container>

FYI, as half-width space is also a neutral character, its position can be changed. Please use either way of the above mentioned.

Get more info from 'Unicode Standard Annex #9 The Bidirectional Algorithm'


Copyright © 1999-2011 Antenna House, Inc. All rights reserved.
Antenna House is a trademark of Antenna House, Inc.