1

I have a spatial database (esri geodatabase format) with Arabic characters. The problem is that when I summarize the values, some of the cells that logically should be identical with each other, are different. For example, "اسدی" and "اسدی" should be (and appear to be) indentical, but are not.

There is no white space in begin and end of the names, but when I check the length of the strings, some are 4 and some are 6. The ones that are 6 characters have 2 hidden characters, but I don't see them, and can't find them.

How can I remove the characters with python? I tried to use rstrip() and lstrip(), but the length remains 6.

TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
BBG_GIS
  • 306
  • 5
  • 17
  • Just a word of caution here: working with Arabic text is no small task. – TigerhawkT3 Jun 06 '15 at 07:46
  • Take a look here: [Python strip() unicode string?](http://stackoverflow.com/questions/7258411/python-strip-unicode-string) – raymelfrancisco Jun 06 '15 at 07:48
  • 1
    or [Stripping non printable characters from a string (incl. unicode)](http://stackoverflow.com/questions/92438/stripping-non-printable-characters-from-a-string-in-python). – Ami Tavory Jun 06 '15 at 07:49
  • If Python 2.X, use `repr()` on each string and update your question with the results. If Python 3.X, use `ascii()`. For example, `print(ascii('اسدی'))` returns `'\u0627\u0633\u062f\u06cc'`. Then you'll see the Unicode codepoints. – Mark Tolonen Jun 06 '15 at 09:07
  • might help if you showed what the characters were – Padraic Cunningham Jun 06 '15 at 09:21
  • those two strings are identical and same length. try doing what @MarkTolonen suggests – fferri Jun 06 '15 at 13:03
  • Thanks. My friend use sql server and t-sql code. he solve the problem. – BBG_GIS Jun 06 '15 at 16:39

0 Answers0