python read text: 're -->鈥檙e

Question

I am reading a text file with the following sentence:

"So whether you’re talking about a Walmart or an IKEA or a Zara, you are really interested in keeping the cost low, keeping the process very efficient."

my code:

files = "*.txt"
for pathname in glob.glob(files):
    with open(pathname,'r') as singlefile:
        data = "".join(singlefile.readlines())
        data = re.sub(r"(?<=\w)\n", " ", data)
        data = re.sub(r",\n", ", ", data)
        print data

result I got is

"So whether you鈥檙e talking about a Walmart or an IKEA or a Zara, you are really interested in keeping the cost low, keeping the process very efficient. That gives us operational excellence."

Can anyone tell me what is wrong? Thanks!

Have a look at the encoding. It looks like the ' is not recognized. — , May 30 '14 at 02:19
You need to read the file using the encoding it was saved as. — SLaks, May 30 '14 at 02:19
Either you find out from the person who gave you the file, or you guess. — Mark Ransom, May 30 '14 at 02:27
P.S. It would help if you `print repr(data)` so we can see the exact bytes. — Mark Ransom, May 30 '14 at 02:29
It is the encoding problem and "utf-8" will read it correctly. codecs.open("file.txt", "r", "utf-8") — Niebieski, Jun 01 '14 at 14:16

score 0 · Accepted Answer · edited May 23 '17 at 12:05

If you get the encoding right (for this also look here, where they also describe an encoding guess list - which is a neat idea), it works just fine. I have tried it with:

import re

with open("words.txt",'r') as singlefile:
    data = "".join(singlefile.readlines())
    data = re.sub(r"(?<=\w)\n", " ", data)
    data = re.sub(r",\n", ", ", data)
    print data

And in the file "words.txt" is this:

 So whether you’re talking about a Walmart or an IKEA or a Zara, you are really interested in keeping the cost low, keeping the process very efficient.

This is the output:

>>> runfile('E:/programmierung/python/spielwiese/test.py', wdir=r'E:/programmierung/python/spielwiese')
So whether you’re talking about a Walmart or an IKEA or a Zara, you are really interested in keeping the cost low, keeping the process very efficient.
>>>

python read text: 're -->鈥檙e

1 Answers1