1

I'm reading a csv file and encountering a problem 'invalid continuation byte'), errno is utf-8 it's happening when it gets to a line with a name Maga�a which is supposed to be Magaña(If I open the csv file in Atom (my editor) and have it auto detect the encoding it chooses Windows 1252 and converts the to ñ)

My question is how do I convert to ñ when opening or reading the file for insert into a db?

I've tried this as a test:

print 'Maga�a'.decode('windows-1252').encode('utf-8')

which prints Maga�a

Example code:

reader = open("my_csv.txt", "r")
for csv_row in reader:
  # insert_row_sql = 'INSERT INTO sometable VALUES (%s,%s,%s,%s,%s,.... )'
  csv_values = csv_row.replace("\n", "").split(',')
  cursor.execute(insert_row_sql, csv_values)
  # blows up, error msg edited
  # Got error UnicodeDecodeError('utf-8', b'Maga\xf1a, 'invalid continuation byte'), errno is utf-8
Sam Luther
  • 1,170
  • 3
  • 18
  • 38
  • Have you tried to print this line directly from the file then decode it? It seems that when you copy the `?` character, it doesn't have the same code as the original. – Liran Funaro May 09 '17 at 16:40
  • How are you reading the CSV file? Are you, for example, using the [`csv`](https://docs.python.org/2/library/csv.html) module? – Robᵩ May 09 '17 at 16:43
  • No, not using the csv module, just using open and reading each line. Also using python 3 – Sam Luther May 09 '17 at 16:49
  • Please provide a short, complete program that demonstrates the error. See [mcve] for more information. – Robᵩ May 09 '17 at 16:52

0 Answers0