0

I am trying to create a CSV of data from a MySQL RDB to move it over to Amazon Redshift. However, one of the fields contains descriptions and some of those descriptions contain the '’' character, or the right single quotation mark. before when I would run the code, it would give me

UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 62: character maps to <undefined>

I then tried using REPLACE to attempt to get rid of the right single quotation marks.

db = pymysql.connect(host='host', port=3306, user="user", passwd="password", db="db", autocommit=True)
cur = db.cursor()
#cur.execute("call inv1_view_prod.`Email_agg`")

cur.execute("""select field_1, 
        field_2, 
        field_3, 
        field_4, 
        replace(field_4_desc,"’","") field_4_desc, 
        field_5, 
        field_6, 
        field_7 
from table_name """) 


emails = cur.fetchall()
with open('O:\file\path\to\file_name.csv','w') as fileout:
        writer = csv.writer(fileout)
        writer.writerows(emails)   
time.sleep(1)

However, this gave me the error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 132: ordinal not in range(256)

And I noticed 132 is the position of the right single quotation mark in the SQL statement so I beieve the code itself may be having an issue with it. I tried using the regular straight apostrophe instead of the right single quotation mark in the REPLACE statement, however this did not replace the character and still came back with the original error. Does anyone know why it won't accept the single quote and how to fix it?

A. Pearson
  • 191
  • 6
  • 20
  • Why are you quoting the `cur.execute ("""`. That could just read `cur.execute("` The other double quotes in the cur.execute string are also not allowed (it will see those as end of string). – Norbert Jul 03 '18 at 15:30
  • Does the database table have the text encoding Latin-1 but somehow contains a non-Latin-1 character? – Arndt Jonasson Jul 03 '18 at 17:09
  • Side-note: This code will break anyway; `'O:\file\path\to\file_name.csv'` is actually (with special characters named in angle brackets to make it clear) `O:
    ile\pathoile_name.csv`. This is why you always use raw strings for Windows paths, e.g. `r'O:\file\path\to\file_name.csv'` (note `r` preceding open quote). You might be getting away with it on your real path (if none of the directories or file names begin with a character that responds to `str` literal escapes), but it's sheer luck; use raw strings consistently so you don't rely on luck.
    – ShadowRanger Nov 10 '20 at 14:44

1 Answers1

0

\u2019 is Unicode for , UTF-8 hex E28099, which is a "RIGHT SINGLE QUOTATION MARK". The direct latin1 equivalent is hex 92. Some word processing products use that instead of apostrophe (').

You are getting the error messages, not because you can't handle the character, but because the configuration fails to declare which encoding is used where.

"132" seems irrelevant: 132 84 E2809E „ &#x84;

Notes on Python: http://mysql.rjweb.org/doc.php/charcoll#python
Notes on other charset issues: Trouble with UTF-8 characters; what I see is not what I stored

Without knowing the schema or the Python configuration, I can't be more specific.

Rick James
  • 135,179
  • 13
  • 127
  • 222