Python readline not working with codecs

Question

I am trying to open, print, and read a text file that contains special characters such as §. Below is the code I am running:

    import codecs
    f = codecs.open('sample_text.txt', mode='r', encoding='utf_8')
    print f.readline()

The first two lines work, but the third does not. The error code says: Traceback (most recent call last):

"C:\Users\mallikk\Documents\Python Scripts\special_char_test.py", line 6, in <module>
    print f.readline()
  File "C:\Anaconda2\lib\codecs.py", line 690, in readline
    return self.reader.readline(size)
  File "C:\Anaconda2\lib\codecs.py", line 545, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Anaconda2\lib\codecs.py", line 492, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 13: invalid start byte

Any ideas? Please let me know if I can clarify anything or add more details. Thank you so much!

This file is not encoded in UTF-8. Find the actual encoding and use that. — user2357112, Jun 23 '16 at 16:42
I don't think that 0xa7 is valid utf8. Are you sure it's in utf-8? Also why are you using codecs and not `open`? — syntonym, Jun 23 '16 at 16:47
http://stackoverflow.com/questions/4255305/how-to-determine-encoding-table-of-a-text-file — stark, Jun 23 '16 at 16:53
@user2357112 It was not in utf-8. I changed it in Notepad++. Thanks for the help! — Shivani, Jun 23 '16 at 16:57
@syntonym I was under the impression that to deal with special characters like §, I would need to use codecs — Shivani, Jun 23 '16 at 16:57
@Shivani [This question](http://stackoverflow.com/questions/5250744/difference-between-open-and-codecs-open-in-python) discusses codecs.open vs builtin open and io.open. Looks like you are right in python2 while in python3 `open` is preferred. — syntonym, Jun 23 '16 at 17:07

score 1 · Accepted Answer · answered Jun 23 '16 at 16:55

To expand on what the commenters said, you need to find out the encoding of your file. The easiest way I know to do that is to:

Open the file in Firefox.
Right-click on the page and select "View Page Info"
See what the "Text Encoding" is.
Then you can check the codecs documentation for the codec to use instead of utf_8 in your f = codecs.open(...) line.

Screenshot of steps 1–3:

score 0 · Answer 2 · edited May 23 '17 at 11:58

0

It looks like you are on a windows machine where encoding for the text file might be different from UTF-8, you might want to try cp1252/ISO-8859-1 use for decoding the bytestring and then encode it again using utf-8.

You can also take a look here for an advice on a best-practice how to read files - Difference between open and codecs.open in Python

edited May 23 '17 at 11:58

Community

1
1

answered Jun 23 '16 at 17:17

Stanley Kirdey

602
5
20

Python readline not working with codecs

2 Answers2