0

I have a file coming from oracle fusion with the name Hyderabad - Telangana

When i received this to the server, the hyphen has become a special character – .

We are using lookup on this value and failing because of the special character.

I downloaded the document to local drive and i can see the hyphen properly.

I tried looking for the solution and most of them are saying that this is because of the encoding issue.

How to find the encoding of a file in unix?

James Z
  • 12,209
  • 10
  • 24
  • 44
  • duplicate of https://stackoverflow.com/questions/805418/how-to-find-encoding-of-a-file-in-unix-via-scripts ? – KamilCuk Jun 22 '18 at 10:24
  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See [What topics can I ask about here](http://stackoverflow.com/help/on-topic) in the Help Center. Perhaps [Super User](http://superuser.com/) or [Unix & Linux Stack Exchange](http://unix.stackexchange.com/) would be a better place to ask. – jww Jun 23 '18 at 06:11
  • *"How to find the encoding of a file in unix?"* - [How to auto detect text file encoding?](https://superuser.com/q/301552/173513) and friends like [Determine text file character set Linux](https://www.google.com/search?q=Determine+text+file+character+set+Linux) – jww Jun 23 '18 at 06:13

1 Answers1

1

Because it was not a normal hyphen but a EN DASH, unicode U+2013. When encoded in UTF-8 it becomes "\xe2\x80\x93". First byte is the code of 'â', which leads me to that path.

Interestingly enough, the 2 other ones are handled by cp1252 charset which is common on West European language Windows versions and are respectively:

Byte      Character in cp1252 charset      Unicode code         Name
0x80             €                            U+20AC            EURO SIGN
0x93             “                            U+201C            LEFT DOUBLE QUOTATION MARK
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252