35

This is more a sort of curiosity. While working on a multilingual web application I noticed that certain characters like punctuation marks (!?.;,) at the end of a block element are rendered as if they were placed at the beginning instead when the writing direction is right-to-left (as it is the case for certain Asian languages I do not speak).

In other words, The string

Hello, World!

is rendered as

!Hello, World

when placed in a div block with direction: rtl

This becomes even more evident if the text is split in two parts and given different colors: a contiguous chunk of text at the end is rendered in two separated regions:

http://jsfiddle.net/22Qk9/

What's the point of this behavior? I guess this must be a peculiarity of (all?) right-to-left languages which is automatically handled by the browser, so I don't need to care about it, or should I?

GOTO 0
  • 42,323
  • 22
  • 125
  • 158
  • 2
    I am by no means an expert in RTL languages, but I would venture the guess that "!" is used the same in RTL languages and thus abides by the RTL language's rules, while interspersed non-RTL text is rendered in native LTR direction. You can see this back and forth between RTL and non-RTL words when going to the Arabic Wikipedia, for instance. – deceze Dec 27 '13 at 10:51
  • This has to do with unicode bidi (bidirectional) strings. Meaning in the same text block theres RTL as well as LTR text, and the browser is trying to figure out how to display it. There's a `unicode-bidi` css property you can use to control what's displayed https://css-tricks.com/almanac/properties/u/unicode-bidi/ - i think it's use is discourage though. – Quang Van Jul 06 '21 at 22:01

3 Answers3

44

If you want to fix this behavior add the LRM character ‎ in the end. It's a non=printing character.

Source : http://dotancohen.com/howto/rtl_right_to_left.html

Example : http://jsfiddle.net/yobjj6ed/

shim
  • 9,289
  • 12
  • 69
  • 108
kchetan
  • 4,987
  • 2
  • 18
  • 17
34

The reason is that the exclamation mark “!” has the BiDi class O.N. ('Other Neutrals'), which means effectively that it adapts to the directionality of the surrounding text. In the example case, it is therefore placed to the left of the text before it. This is quite correct for languages written right to left: the terminating punctuation mark appears at the end, i.e. on the left.

Normally, you use the CSS code direction: rtl or, preferably, the HTML attribute dir=rtl for texts in a language that is written right to left, and only for them. For them, this behavior is a solution, not a problem.

If you instead use direction: rtl or dir=rtl just for special effects, like making table columns laid out right to left, then you need to consider the implications. For example, in the table case, you would need to set direction to ltr for each cell of the table (unless you want them to be rendered as primarily right to left text).

If you have, say, an English sentence quoted inside a block of Arabic text, then you need to set the directionality of an element containing the English text to ltr, e.g.

<blockquote dir=ltr>Hello, World!</blockquote>

A similar case (just with Arabic inside English text) is discussed as use case 6 in the W3C document What you need to know about the bidi algorithm and inline markup (which has a few oddities, though, like using cite markup for quoted text, against W3C recommendations).

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
8

The accepted answer https://stackoverflow.com/a/20799360/477420 works if you can control markup/CSS of the value, if you have no control over HTML following approach could work.

If you don't know if page will be rendered RTL or LTR but some text is definitely LTR (i.e. English-only) you can wrap the value with LRE/PDF marks to signify that is LTR region. Text will be rendered LTR irrespective of page's LTR or RTL direction.

This works when you have some code that tries to render text without ability to change markup of how exactly it will show up on the page. I.e. you rendering value for "song tile" or "company name" field in some nested child component (or server side) without ability to control surrounding HTML elements.

One drawback of this and similar approaches (like LRM proposal in this question) with adding marks to text is copy-paste of such value from the resulting HTML page will generally preserve the marks but they are not visible/zero width. While for most cases it is fine consider if that is a problem for you.

Approximate sample code (some companies have "Inc." at the end which will end up with dot at the beginning when rendered as-is on RTL page):

 // comanyName = "Alphabet Inc." - really likes dot at the end including RTL
 if(stringIsDefinitelyAscii(companyName))
 {
     companyName = "\u202A" + companyName + "\u202C"
 }
 return companyName;

Details on LRE/PDF symbols can be found in https://unicode.org/reports/tr9/#Explicit_Directional_Embeddings:

LRE U+202A LEFT-TO-RIGHT EMBEDDING Treat the following text as embedded left-to-right.

PDF U+202C POP DIRECTIONAL FORMATTING End the scope of the last LRE, RLE, RLO, or LRO.

Some approaches to figure out if string has RTL characters can be found in How to detect whether a character belongs to a Right To Left language?, JavaScript: how to check if character is RTL?, How to detect if a string contains any Right-to-Left character?.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179