python's encoding difference

Question

# my scraper script file
#-*- coding: utf-8 -*-
from selenium import webdriver
import csv

browser = webdriver.Firefox()
browser.get("http://web.com")

f = open("result.csv", 'w')
writer = csv.writer(f)

then the first method

element = browser.find_element_by_xpath("xpath_addr")
temp = [element.get_attribute("innerHTML").encode("utf-8")]
print temp                # ['\xec\x84\something\xa8']
writer.writerow(temp)

this results in the right csv file with my language.(e.g. 한글)

but the second case, which I think just a little different

element = browser.find_element_by_xpath("xpath_addr")
temp = element.get_attribute("innerHTML").encode("utf-8")
print temp                # "한글" 
writer.writerow(temp)

then the csv file is full of non-character things. What makes this difference? print also gets different results but why? (It must be the problem because of my little knowledge about encoding)

score 3 · Accepted Answer · edited May 23 '17 at 12:04

Firstly, writerow interface is expecting a list-like object, so the first snippet is correct for this interface. But in your second snippet, the method is assuming that the string you have passed as an argument is a list - and iterating it as such - which is probably not what you wanted. You could try writerow([temp]) and see that it should match the output of the first case.

Secondly, I want to warn you that Python csv module is notorious for headaches with unicode, basically it's unsupoorted. Try using unicodecsv as a drop-in replacement for the csv module if you need to support unicode. Then you won't need to encode the strings before writing them to file, you just write the unicode objects directly and let the library handle the encoding.

You're right it's working, thanks! I should have been cautious about what object is expected. Thank you for the tip(newbie like me would use csv module forever without your tip) unicodecsv also! — Jeongbin Kim, Dec 28 '15 at 16:11

python's encoding difference

1 Answers1