I recently inherited a python project and I've got some behavior I'm struggling to account for.
The code has two sections, it can import a file into the database, or it can dump the database to an output file. The import looks something like this:
def importStuff(self):
mysqlimport_args = ['mysqlimport', '--host='+self.host, '--user='+self.username, '--password='+self.password, '--fields-terminated-by=|', '--lines-terminated-by=\n', '--replace', '--local', self.database, filename, '-v']
output = check_output(mysqlimport_args)
The dump looks like this:
def getStuff(self):
db = MySQLdb.connect(self.host, self.username, self.password, self.database)
cursor = db.cursor()
sql = 'SELECT somestuff'
cursor.execute(sql)
records = cursor.fetchall()
cursor.close()
db.close()
return records
def toCsv(self, records, csvfile):
f = open(csvfile, 'wb')
writer = csv.writer(f, quoting=csv.QUOTE_ALL)
writer.writerow(['StuffId'])
count = 1
for record in records:
writer.writerow([record[0]])
f.close()
Okay not the prettiest python you'll ever see (style comments welcome as I'd love to learn more) but it seems reasonable.
But, I got a complaint from a consumer that my output wasn't in UTF-8 (the mysql table is using utf8 encoding by the way). Here's where I get lost, if the program executes like this:
importStuff(...)
getStuff(...)
toCsv(...)
Then the output file doesn't appear to be valid utf-8. When I break the execution into two different steps
importStuff(...)
then in another file
getStuff(...)
toCsv(...)
Suddenly my output appears as valid utf-8. Aside from the fact that I have a work around, I can't seem to explain this behavior. Can anyone shed some light on what I'm doing wrong here? Or is there more information I can provide that might clarify what's going on?
Thanks.
(python 2.7 in case that factors in)
EDIT: More code as requested. I've made some minor tweaks to protect the innocent such as my company, but it's more or less here:
def main():
dbutil = DbUtil(config.DB_HOST, config.DB_DATABASE, config.DB_USERNAME, config.DB_PASSWORD)
if(args.import):
logger.info('Option: --import')
try:
dbutil.mysqlimport(AcConfig.DB_FUND_TABLE)
except Exception, e:
logger.warn("Error occured at mysqlimport. Error is %s" % (e.message))
if(args.db2csv):
try:
logger.info('Option: --db2csv')
records = dbutil.getStuff()
fileutil.toCsv(records, csvfile)
except Exception, e:
logger.warn("Error Occured at db2csv. Message:%s" %(e.message))
main()
And that's about it. It's really short which is making this much less obvious.
The output I'm not sure how to faithfully represent, it looks something like this:
"F0NR006F8F"
They all look like more or less ASCII characters to me, so I'm not sure what problem they could be creating. Maybe I'm approaching this from the wrong angle, I'm currently relying on my text editor's best guess for what encoding a file is in. I'm not sure how I could best detect which character is causing it to stop reading my file as utf-8.