-1

I want to know the ANSI value of the character "\u202B" that make RTL alignment in the text file, the problem that I've used it in UTF8 file and it makes the text RTL but when the text file is ANSI it shows marks "???" that means that this character not identified, so any one can know what's the opposite code for this character in ANSI?

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Mahmoud Ismail
  • 187
  • 1
  • 3
  • 12

2 Answers2

1

Windows-1256 is the "ANSI code page" if the system locale is set to Arabic. A misnomer, but that what is called by all MS documentation... In the Windows world "ANSI code page" should read "system code page"

Anyway, U+202B has no equivalent in in windows-1256. You can probably achieve what you need with

U+200E  LEFT-TO-RIGHT MARK    0xFD in windows-1256
U+200F  RIGHT-TO-LEFT MARK    0xFE in windows-1256  
Mihai Nita
  • 5,547
  • 27
  • 27
  • how can i write this character in the text file using java, shall i use "0XFE" or "\u0XFE" ? i've tried a lot but the character appear like that don't map to the RIGHT-TO-LEFT MARK – Mahmoud Ismail Jul 16 '14 at 08:23
  • Use `\xFE`. Make sure you work on byte arrays and not Strings - If you use Strings, then these special characters will get lost when you convert the String to bytes. – Aaron Digulla Jul 16 '14 at 10:19
  • This might also work: `new String("\u200e").getBytes("Windows-1256")` should give you a byte array of size 1 with the single byte `0xFE` (or `-2`) in it. – Aaron Digulla Jul 16 '14 at 10:21
0

There isn't one. ANSI is a pretty old standard by the American National Standards Institute. It doesn't support RTL languages like Arabic or Hebrew.

The Wikipedia Article "ANSI escape code" lists all the codes that it supports.

The workaround is to use a font which renders the glyphs (characters) you need, print them in the opposite order and use cursor movement commands to right align the text.

[EDIT] You're confusing a couple of things. First of all, ANSI is a set of escape sequences to control your terminal.

ASCII, Windows 1256 and UTF-8 are character encodings (i.e. ways to represent text as sequences of octets or bytes).

Unicode is a library of glyphs. It tries to contain each and every glyph that you need to display text in any language. You can encode Unicode data using UTF-8, -16, etc. to serialize it.

The special Unicode Character RIGHT-TO-LEFT EMBEDDING (U+202B) has no representation in any other character encoding.

You will have to write a program to parse the input and then you will have to output the text to the printer, sorting the characters in the correct order. There is no shortcut to do this.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820