Long story short, I have a text in the following manner:
const string example1 = "good בוקר טוב morning";
I am trying to split the above text into 4 words (in my case tokens) for processing. No matter what approach I take I still get the 2 Hebrew words in the wrong order. By order I mean the display order and not the logical order.
When using String.Split
for example1
Returned
- good
- בוקר
- טוב
- morning
In the above you may notice the 2 Hebrew words are now in the wrong (display) position.
Expected
- good
- טוב
- בוקר
- morning
I have tried using Regex.Split, forcing invariant cultures and even appending Unicode characters for splitting in LTR order of display.
Though I am not keen on solving this via Regex I am looking for a generic approach to such problem that would stay intact in other RTL languages say Arabic.
Before making this post I have checked the following in case it gets marked as duplicate by others.
Parsing through Arabic / RTL text from left to right - Presumes that the developer knows where the RTL word is located. I am not using a semi-colon as a word separator. It can be any culture specific character. Also, adding an invisible marker would result in future string equality comparison failure.
c# split and revers sentence with two languages - The character order is modified. I wish to keep them intact.
Any advice or suggestion that puts me in the right direction is appreciated.