Python decodes byte string correctly on windows but not working on linux.Giving Unicode error

Question

I am trying to decode byte string which has unicode character EN DASH, to get the proper unicode string.

Below code is running fine on windows with python 3.6:

decode_header_sequence = [(b'Excel to csv \xe2\x80\x93 Conversion .csv', 'utf-8')]
print(decode_header_sequence[0][0].decode('utf-8'))

which gives me string - 'Excel to csv – Conversion .csv'

But when I execute the same lines on linux platform. Code is failing with unicode error: 'ascii' codec can't encode characters in position 16-18: ordinal not in range(128)

I have tried almost everything that I found under the threads like this.But no luck. Anyone can help me with solving this issue as i really don't know Why this is happening?

The decoding goes fine. The problems happen when you try to *print*. — user2357112, Apr 03 '20 at 12:58
@user2357112 supports Monica what problem is happening there..can you please elaborate me this? — Anvita, Apr 03 '20 at 13:06
Possibly useful https://stackoverflow.com/a/57224678/5320906, https://stackoverflow.com/a/54599110/5320906. — snakecharmerb, Apr 03 '20 at 13:47

score 0 · Answer 1 · answered Apr 08 '20 at 14:42

On windows I found sys.getfilesystemencoding() was set to UTF-8 and on linux it was set to ascii.Therefore on windows it was easily decoding utf-8 characters in the input string.But giving error on linux.I just get the rid of this by ignoring utf-8 characters from input string. I decoded string as below :

ascii_String = input_string.decode('ascii', errors="ignore").encode('ascii')

Python decodes byte string correctly on windows but not working on linux.Giving Unicode error

1 Answers1