3

I am trying to open some file and I know there are some errors in the file with UTF-8 encoding, so what I will do in python3 is

open(fileName, 'r', errors = 'ignore') 

but now I need to use python2, what are the corresponding way to do this?

Below is my code after changing to codecs

    with codecs.open('data/journalName1.csv', 'rU', errors="ignore") as file:
        reader = csv.reader(file)
        for line in reader:
            print(line) 

And file is here https://www.dropbox.com/s/9qj9v5mtd4ah8nm/journalName.csv?dl=0

1a1a11a
  • 1,187
  • 2
  • 16
  • 25

2 Answers2

8

Python 2 does not support this using the built-in open function. Instead, you have to uses codecs.

import codecs
f = codecs.open(fileName, 'r', errors = 'ignore')

This works in Python 2 and 3 if you decide you need to switch your python version in the future.

washcloth
  • 2,730
  • 18
  • 29
  • It is still not correct, "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 2028: invalid continuation byte" there is still error, I am going to add my code and upload file. – 1a1a11a Jun 08 '15 at 02:35
  • Actually I just copy the lines into another file, there is no error, but for this file, if I used the old way in python3, it can pass, but using codecs.open in python2, there is still error, please help me, thank you! – 1a1a11a Jun 08 '15 at 02:43
1

For UTF-8 encoded files I would suggest io module.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import io

f=io.open('file.txt', 'r',  encoding='utf8')
s=f.read()
f.close()
Alex Ivanov
  • 695
  • 4
  • 6
  • `some errors in the file with UTF-8 encoding` means that it really itsn't a pure UTF-8 file. – washcloth Jun 08 '15 at 02:23
  • My guess was the OP got an error: "UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)" or the like. It happens sometimes with UTF-8 decoded strings. Such an error is usually fixed by encoding in UTF-8, not ASCII. – Alex Ivanov Jun 08 '15 at 02:57