0

I am trying to decode byte string which has unicode character EN DASH, to get the proper unicode string.

Below code is running fine on windows with python 3.6:

decode_header_sequence = [(b'Excel to csv \xe2\x80\x93 Conversion .csv', 'utf-8')]
print(decode_header_sequence[0][0].decode('utf-8'))

which gives me string - 'Excel to csv – Conversion .csv'

But when I execute the same lines on linux platform. Code is failing with unicode error: 'ascii' codec can't encode characters in position 16-18: ordinal not in range(128)

I have tried almost everything that I found under the threads like this.But no luck. Anyone can help me with solving this issue as i really don't know Why this is happening?

sjakobi
  • 3,546
  • 1
  • 25
  • 43
Anvita
  • 75
  • 1
  • 11

1 Answers1

0

On windows I found sys.getfilesystemencoding() was set to UTF-8 and on linux it was set to ascii.Therefore on windows it was easily decoding utf-8 characters in the input string.But giving error on linux.I just get the rid of this by ignoring utf-8 characters from input string. I decoded string as below :

ascii_String = input_string.decode('ascii', errors="ignore").encode('ascii')
Anvita
  • 75
  • 1
  • 11