9

There is a strange behavior when trying to create string which contains a Hebrew letter and a digit. The digit will always be displayed left to the letter. For example:

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;
textBlock1.Text = AB;
//Ouput bug - B is left to A.

This bug only happens when using both a Hebrew letter and digits. When omitting one of those from the equation the bug won't happen:

string A = "\u20AA"; //Some random Unicode.
string B = "23";
string AB = A + B;
textBlock1.Text = AB;
//Output OK.

string A = "\u05E9"; //A Hebrew letter.
string B = "HELLO";
string AB = A + B;
textBlock1.Text = AB;
//Output OK.

I tried playing with FlowDirection property but it didn't help.

A workaround to get the text displayed properly in the first code exmaple would be welcomed.

Yaron Levi
  • 12,535
  • 16
  • 69
  • 118

4 Answers4

15

The unicode characters "RTL mark" (U+200F) and "LTR mark" (U+200E) were created precisely for this purpose.

In your example, simply place an LTR mark after the Hebrew character, and the numbers will then be displayed to the right of the Hebrew character, as you wish.

So your code would be adjusted as follows:

string A = "\u05E9"; //A Hebrew letter
string LTRMark = "\u200E"; 
string B = "23";
string AB = A + LTRMark + B;
Avi Shmidman
  • 930
  • 5
  • 9
  • Great answer. Do you know if there's a way to determine if a string requires the LRTMark when being appended to? I'd like to always append B to the right of A, but obviously in most locales, this will be the default, so would be a wasted step. – Dan W Sep 27 '12 at 20:20
  • Before adding the LTRMark, you will want to check whether the character before the number is a character with RTL directionality. There's a good discussion, with a number of valid solutions, over here: http://stackoverflow.com/questions/4330951/how-to-detect-whether-a-character-belongs-to-a-right-to-left-language – Avi Shmidman Sep 28 '12 at 08:42
  • Thanks. I've just opened a 50 point bounty for this question. I'd be very grateful if you could take a look at it: http://stackoverflow.com/questions/12630566/parsing-through-arabic-rtl-text-from-left-to-right – Dan W Sep 29 '12 at 22:07
  • The problem with this solution is that it adds a new character to the string. For scenarios where you care about a fixed size character array, it will not help, I guess. – Veverke Dec 22 '16 at 16:49
4

This is because of Unicode Bidirectional Algorithms. If I understand this correctly, the unicode character has an "identifier" that says where it should be when it's next to another word.

In this case \u05E9 says that it should be to the left. Even if you do:

var ab = string.Format("{0}{1}", a, b);

You will still get it to the left. However, if you take another unicoded character such as \u05D9 this will be added to the right because that character is not said to be on the left.

This is the layout of the language and when outputting this the layout enginge will output it according to the language layout.

Filip Ekberg
  • 36,033
  • 20
  • 126
  • 183
  • Interesting, so how can you explain that in my first code example the letter is not to the left, as you said ? – Yaron Levi Jul 06 '11 at 11:09
  • +1 The `B` is not physically "to the left of `A`". That's only how your output device chooses to draw the character sequence. Internally everything is in order, or should I say בסדר! – Kerrek SB Jul 06 '11 at 11:21
  • @Yaron Levi based on the scenario u have mentioned, digits will be treated as a part of Hebrew unicode chars. so it still will be - right to left. i.e. digits first (as they came last) followed by unicode letter. – Nika G. Jul 06 '11 at 11:26
  • @Kerrek SB Sure I understand what you are saying. But "Internally" doesn't help me. I need to get the result I want displayed in a textBlock. – Yaron Levi Jul 06 '11 at 11:41
  • @Yaron: That depends entirely on your layout engine. If it applies the bidirectional algorithm, then it'll reorder the glyphs when rendering. The process is deterministic, and if you're unsure, you can read up the precise algorithm -- it's fairly involved, imagine mixing Hebrew, Arabic, numerals, Indic, Thai... and then there are explicit invisible directional override characters... good luck! – Kerrek SB Jul 06 '11 at 11:44
0

That strange Behavior has explanation. Digits with unicode chars are treated as a part of unicode string. and as Hebrew lang is read right to left, scenario will give

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;

B comes first, followed by A.

second scenario:

string A = "\u20AA"; //Some random Unicode.
string B = "23";
string AB = A + B;

A is some unicode, not part of lang that is read right to left. so output is - first A followed by B.

now consider my own scenario

string A = "\u05E9";
string B = "\u05EA";
string AB = A + B;

both A and B are part of right to left read lang, so AB is B followed by A. not A followed by B.

EDITED, to answer the comment

taking into account this scenario -

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;

The only solution, to get letter followed by digit, is : string AB = B + A;

prolly, not a solution that will work in general. So, I guess u have to implement some checking conditions and build string according the requirements.

Nika G.
  • 2,374
  • 1
  • 17
  • 18
  • Ok, so is there a work around to get the result I want : Displaying the string in the order I want inside a textBlock ? – Yaron Levi Jul 06 '11 at 11:37
  • I tried this already but it doesn't work. The text displayed in the textBlock is still not in the correct order. – Yaron Levi Jul 06 '11 at 12:14
  • @Yaron Levi I guess this is asp.net project, so following code `textBlock1.Attributes.Add("dir", "rtl");` will let the control know that text direction is set to `rtl` (right to left) – Nika G. Jul 06 '11 at 12:25
  • It's a silverlight project. It not hosted in anything. Everything happens locally. I changed the FlowDirection property in XAML. But ,again, I really don't think it's related to our problem. I just mentioned it in case someone will suggest it as a possible solution. – Yaron Levi Jul 06 '11 at 12:34
  • @Yaron Levi Im not familiar with silverlight. as for .net 3.5 following code did the trick `string A = "\u05E9"; //A Hebrew letter string B = "23"; string AB = B + A; Label1.Attributes.Add("dir", "rtl"); Label1.Text = AB;` – Nika G. Jul 06 '11 at 12:40
0
string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = B + A; // !
textBlock1.Text = AB;
textBlock1.FlowDirection = FlowDirection.RightToLeft;
//Ouput Ok - A is left to B as intended.
Yaron Levi
  • 12,535
  • 16
  • 69
  • 118