As you have not shown the code that causes the error, I am just guessing.
The only fact is that this string 'Rejected-Case No. doesn’t match'
contains a "’"
which is the unicode character U+2019, RIGHT SINGLE QUOTATION MARK. In the windows cp1252 code page, this character has indeed the code 0x92
.
It looks like you have somewhere a byte string encoded in cp1252 charset that has not been correctly decoded to a unicode string.
What should be done:
There are solutions. Unfortunately, they will depend on the Python version that you are using (2 or 3), and without knowing anything of the code I can only give the generic advices:
- identify the input charset (what the database gives to the Python script)
- identify the output charset (what you want to write in the CSV module)
- use explicit conversions to be able to pass the correct charsets
- optionaly use
error=replace
in the encoding/decoding calls to avoid the UnicodeError exceptions.
If you use Python3, I will assume that you have a problem in decoding unicode from the database. The RIGHT SINGLE QUOTATION MARK has unicode code U+2019, but in the string given to Python is coded '\x92'
which is the cp1252
byte encoding. A quick and dirty fix is to force an encoding/decoding pass to get a correct unicode string. Your code could become:
db = pymysql.connect(host='host', port=3306, user="user", passwd="secret",
db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")
cur.execute("""select fields from table""")
charset = 'cp1252' # or 'utf8' depending on what you want in the csv file
with open('O:\file\path\to\File_name.csv','w', encoding=charset,
errors='replace', newline='') as fileout:
writer = csv.writer(fileout)
for row in cur.fetchall():
writer.writerow([field.encode('latin1').decode('cp1252', errors='replace')
for field in row])
The encode('latin1').decode('cp1252')
is just a trick to fix a Python3 string where characters have the code of a byte encoding. It works because latin1
encoding is a no-op for all codes under 256.
The errors=replace
option, ask Python to never raise an UnicodeError exception but instead to replace the offending character with a '?'
for a byte string or with the official unicode REPLACEMENT CHARACTER U+FFFD '�'
for an unicode string.
It would probably be cleaner to use the charset
option of pymysql.connect
. Unfortunately, I have never used MySQL databases from Python...