2

Long story short, I have a text in the following manner:

const string example1 = "good בוקר טוב morning";

I am trying to split the above text into 4 words (in my case tokens) for processing. No matter what approach I take I still get the 2 Hebrew words in the wrong order. By order I mean the display order and not the logical order.

When using String.Split for example1

Returned

  • good
  • בוקר
  • טוב
  • morning

In the above you may notice the 2 Hebrew words are now in the wrong (display) position.

Expected

  • good
  • טוב
  • בוקר
  • morning

I have tried using Regex.Split, forcing invariant cultures and even appending Unicode characters for splitting in LTR order of display.

Though I am not keen on solving this via Regex I am looking for a generic approach to such problem that would stay intact in other RTL languages say Arabic.

Before making this post I have checked the following in case it gets marked as duplicate by others.

Parsing through Arabic / RTL text from left to right - Presumes that the developer knows where the RTL word is located. I am not using a semi-colon as a word separator. It can be any culture specific character. Also, adding an invisible marker would result in future string equality comparison failure.

c# split and revers sentence with two languages - The character order is modified. I wish to keep them intact.

Any advice or suggestion that puts me in the right direction is appreciated.

STF
  • 1,485
  • 3
  • 19
  • 36
Nathan
  • 1,303
  • 12
  • 26
  • Is your example there the order you get them in? Or the order you want them to be in? – Matt Burland Jun 07 '17 at 13:15
  • @MattBurland Sorry I didn't add that in there. It's the order I am getting it in. I shall update the post to avoid further confusions. – Nathan Jun 07 '17 at 13:15
  • @MattBurland Marking duplicate in seconds proves for itself you didn't actually care to read the post. It is not a duplicate of what you've suggested. In that post the resolution provided presumes that the developer knows where the RTL words are located and has to explicitly add a Left-to-Right marker (\U+200E) for each RTL word. – Nathan Jun 07 '17 at 13:43

0 Answers0