0

Sorry if this isn't a reproducible example but I am guessing someone will know what to do when I describe the problem. The problem I have is that I am getting characters like "\xe2" "\x80" from a txt file that I am reading in the following way:

words = open("directory/file.txt","r")
liness = []
for x in words.readlines():
    liness.append(lines.rstrip('\n'))

When I print lines I get the list I want, but then when I use max() in the following way:

max(liness, key = len)

returns the "a line from file.txt that containts \xe2 and \x80" I know this probably has something to do with encoding, but I haven't had luck solving it. Anyone?

Michael0x2a
  • 58,192
  • 30
  • 175
  • 224
theamateurdataanalyst
  • 2,794
  • 4
  • 38
  • 72

1 Answers1

0

I tried to reproduce your error but used the following code:

words = open("directory/file.txt", 'r', 0)
line = words.readline()
wordlist = string.split(line)

Unfortunately, I was not able to reproduce your error as you would have guessed. My file was txt file with a list of English words.

I assume that you are reading a .txt file with non-standard American English characters, correct?. If you are not using American English characters, you might want to check out this post:

Handling non-standard American English Characters and Symbols in a CSV, using Python

You will need to determine what type of encoding/decoding to use based on your file.

Community
  • 1
  • 1
rsk22
  • 49
  • 3