0

tl;dr I have a complicated SQL query that returns results in .csv format. Unfortunately, the downside to being Canadian, is that a bunch of people just love to add accents to their names.

At the moment, this is what I have. I didn't write the script initially, just trying to make it work to grab the data. The SQL is correct, just truncated by nano.

def getCommentsByGUID(reportDay):
        conn = mysql.connector.connect(host = dbServer, user = dbUser, passwd = dbPass, db = tscDBName)
        sqlresult = ''
        sql = 'select d.documentId, dn.created, dn.notes as Comment, u.login, d.fileName from DocumentInfo d \
inner join documentnotes dn on d.documentId=dn.documentId \
inner join User u on dn.userId=u.userID \
where d.companyId=%d and dn.created>\'%s 00:00:00\' and dn.created <\'%s 23:59:59\';' % (companyID, reportDay, reportDa$
        cursor = conn.cursor()
        cursor.execute (sql)
        sqlresult = cursor.fetchall()
        cursor.close ()
        reportWriter = csv.writer(open('%sFilename_comments_%s.csv' % (outputDir, reportDay), 'w'), delimiter=',', quotec$
        for results in sqlresult:
                reportWriter.writerow(results)

Running as is creates a unicode error, such as:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 366: ordinal not in range(128)

Now, after a lot of research, I've come across something like this:

for results in sqlresult:
        try:
                reportWriter.writerow(results)
         except UnicodeEncodeError:
                s = list(results)
                for item in s:
                        if isinstance(item, basestring) == True:
                                a = s.index(item)
                                unicodedata.normalize('NFKD', item).encode('ascii', 'ignore')
                                s[a] = item
                                print item
                s = tuple(s)
                print s
                reportWriter.writerow(s)

Yet still get the same error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 366: ordinal not in range(128)

Any ideas on what I'm either doing wrong, or what else I can try? Thanks!

Don Julio
  • 405
  • 1
  • 4
  • 5
  • If you use `cat your-file.py` to echo the file to your shell window, you'll be able to post your script, which would be extra helpful because the most important line is being cut off :) you may also want to look at this question among others: http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20 – Josh Rumbut Dec 07 '15 at 21:52
  • Here's the line that opens the file for writing: reportWriter = csv.writer(open('%sDundee_Comments_%s.csv' % (outputDir, reportDay), 'w'), delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC) That said, I've already made it work thanks to jwodder's answer below: for results in sqlresult: results = [x.encode('utf-8') if isinstance(x, unicode) else x for x in results] reportWriter.writerow(results) – Don Julio Dec 08 '15 at 01:42

1 Answers1

1

The MySQL connection is apparently returning strings as unicode objects, while the filehandle passed to csv.writer was not opened with a specific encoding and thus expects to be given str objects representing raw bytes.

Solution 1: Encode the strings in your preferred encoding (presumably UTF-8) before passing them to writerow:

for results in sqlresult:
    results = [x.encode('utf-8') if isinstance(x, unicode) else x for x in results]
    reportWriter.writerow(results)

Solution 2: Open the output filehandle in a Unicode-aware mode:

import io

reportWriter = csv.writer(io.open('%sFilename_comments_%s.csv' % (outputDir, reportDay), 'w', encoding='utf-8'), delimiter=',', quotec$
jwodder
  • 54,758
  • 12
  • 108
  • 124
  • This is perfect, thanks! Solution 1 did exactly what I needed it to. Solution 2, I've already tried, but io.open had some issues. – Don Julio Dec 08 '15 at 01:40