Python: How to deal with replacement character �

Question

I'm reading a csv file and encountering a problem 'invalid continuation byte'), errno is utf-8 it's happening when it gets to a line with a name Maga�a which is supposed to be Magaña(If I open the csv file in Atom (my editor) and have it auto detect the encoding it chooses Windows 1252 and converts the � to ñ)

My question is how do I convert � to ñ when opening or reading the file for insert into a db?

I've tried this as a test:

print 'Maga�a'.decode('windows-1252').encode('utf-8')

which prints Magaï¿½a

Example code:

reader = open("my_csv.txt", "r")
for csv_row in reader:
  # insert_row_sql = 'INSERT INTO sometable VALUES (%s,%s,%s,%s,%s,.... )'
  csv_values = csv_row.replace("\n", "").split(',')
  cursor.execute(insert_row_sql, csv_values)
  # blows up, error msg edited
  # Got error UnicodeDecodeError('utf-8', b'Maga\xf1a, 'invalid continuation byte'), errno is utf-8

Have you tried to print this line directly from the file then decode it? It seems that when you copy the `?` character, it doesn't have the same code as the original. — Liran Funaro, May 09 '17 at 16:40
How are you reading the CSV file? Are you, for example, using the [`csv`](https://docs.python.org/2/library/csv.html) module? — Robᵩ, May 09 '17 at 16:43
No, not using the csv module, just using open and reading each line. Also using python 3 — Sam Luther, May 09 '17 at 16:49
Please provide a short, complete program that demonstrates the error. See [mcve] for more information. — Robᵩ, May 09 '17 at 16:52

Python: How to deal with replacement character �

0 Answers0