I'm reading a csv file and encountering a problem 'invalid continuation byte'), errno is utf-8
it's happening when it gets to a line with a name Maga�a
which is supposed to be Magaña
(If I open the csv file in Atom (my editor) and have it auto detect the encoding it chooses Windows 1252
and converts the �
to ñ
)
My question is how do I convert �
to ñ
when opening or reading the file for insert into a db?
I've tried this as a test:
print 'Maga�a'.decode('windows-1252').encode('utf-8')
which prints Maga�a
Example code:
reader = open("my_csv.txt", "r")
for csv_row in reader:
# insert_row_sql = 'INSERT INTO sometable VALUES (%s,%s,%s,%s,%s,.... )'
csv_values = csv_row.replace("\n", "").split(',')
cursor.execute(insert_row_sql, csv_values)
# blows up, error msg edited
# Got error UnicodeDecodeError('utf-8', b'Maga\xf1a, 'invalid continuation byte'), errno is utf-8