0

I am using following snippet to read the file. It returns two different output in windows and linux server. I am using python 3.

with open('test.txt','rb') as f:
    data = f.read().decode('utf-8')
    print(type(data.splitlines()[34560]))
    print(data.splitlines()[34560])

Result in windows:

<class 'str'>
testpair14/user_photos/images/282/original/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png

Result in Linux:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 53-54: ordinal not in range(128)

What could be the reason for this? Please suggest.

Srikanth Bhandary
  • 1,707
  • 3
  • 19
  • 34

1 Answers1

1

To get started, read https://docs.python.org/3/howto/unicode.html

To read a text file, just open it as a text file and specify an encoding if needed:

open('test.txt','r', encoding="utf-8")

Read operations on that file will then return Unicode strings rather than byte strings. As a rule, whenever you handle text, always use Unicode objects.

Printing Unicode to the console is another can of worms, and especially on Windows poorly supported. But there are plenty of answers to that problem already on StackOverflow, eg. here: Python, Unicode, and the Windows console and Understanding Python Unicode and Linux terminal

Community
  • 1
  • 1
roeland
  • 5,349
  • 2
  • 14
  • 28