0

I'm trying to write some news headline into CSV using python CSV module and it seems that when there is an Apostrophe in a headline, such as 'What’s So Great About Snapchat Anyway?', then encode error would show up.

The error is as below:

enter image description here

Code for this:

enter image description here

Are there any thoughts about this error or any suggestions?

Ian Zhang
  • 402
  • 3
  • 17
  • problem is not CSV but terminal/console in your system (probably Windows) because it doesn't display `UTF-8` and it has problem to convert it. [Change default code page of Windows console to UTF-8](http://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8) – furas Feb 05 '17 at 23:59
  • Thanks for replying Furas! I figured out that it's because of the Python CSV module doesn't support Unicode... Here's a post which is useful. [link](http://stackoverflow.com/questions/3224268/python-unicode-encode-error) – Ian Zhang Feb 06 '17 at 00:10

1 Answers1

1

Python2.7 csv module can't handle unicode natively. But the docs have an example of how to do it in the class UnicodeWriter. You can also try python3 because csv module there will handle unicode natively.

This snippet has been shamelessly ripped from the docs I linked

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

Then you can just call it doing

writer = UnicodeWriter(open("foo", "w"))
writer.writerow(['1', 'bar'])
Greg
  • 5,422
  • 1
  • 27
  • 32
  • Thanks for replying Greg! You are totally right. I solved this problem by add this `title = content.text.encode('ascii', 'ignore')` when I grab title. – Ian Zhang Feb 06 '17 at 00:10