0

Here is my code

The following works but writes the wrong string

import csv
import codecs


if __name__ == "__main__":

    # This works for writing unico but writes wrong string
    with codecs.open("./why_unicode.csv", "wb" ) as csv_file:

        writer = csv.writer(csv_file)


        unico = u'IP\u4e13\u7528\u8033\u673a.\u9ed1'.encode('utf-8')
        writer.writerow(unico)

Here is the result

I,P,ä,¸,“,ç,”,¨,è,€,³,æ,œ,º,.,é,»,‘

which is not the correct string. The correct string is 'IP专用耳机.黑'

This doesn't work

import csv
import codecs


if __name__ == "__main__":

    with codecs.open("./why_unicode.csv", "wb", 'utf-8' ) as csv_file:

        writer = csv.writer(csv_file)
        unico = u'IP\u4e13\u7528\u8033\u673a.\u9ed1'

        writer.writerow(unico)

Here is the error

SyntaxError: Non-ASCII character '\xe4' in file 
test_unicode.py on line 15, but no encoding declared; 
see http://www.python.org/peps/pep-0263.html for details

And this won't run at all

import csv
import codecs


if __name__ == "__main__":

     with codecs.open("./why_unicode.csv", "wb") as csv_file:

        writer = csv.writer(csv_file)


        unico = u'IP\u4e13\u7528\u8033\u673a.\u9ed1'.encode('utf-8')
        #chn = u'IP专用耳机.黑' # even commenting out will return error
        writer.writerow(unico)

The standard response to this type of question in stackoverflow is to either use codecs or to encode('utf-8), I tried both but neither works, this is a bit confusing, can someone help me out?

Edit:

The script is using python 2.7.3 (from python -V)

Kevin
  • 371
  • 3
  • 7
  • 14
  • 1
    at first it would be nice to know which python version you are using. They are acting fundamentally different when handling unicode chars. I came across a similar problem with the µ symbol in python3. I tried this in python3 just a moment ago and got b'IP\xe4\xb8\x93\xe7\x94\xa8\xe8\x80\xb3\xe6\x9c\xba.\xe9\xbb\x91' as output... – Ramon Jul 02 '15 at 22:53
  • 1
    Ahh the error is telling you the problem. python assumes ascii in your file. you can declare your file encoding with something like: "# -*- coding: utf-8 -*-" at the top of your file. maybe thats enough, just stumbled over this thread: http://stackoverflow.com/questions/6289474/working-with-utf-8-encoding-in-python-source – Ramon Jul 02 '15 at 22:56
  • 1
    You cannot use unicode with the csv module in python 2 http://stackoverflow.com/questions/30551429/error-writing-data-to-csv-due-to-ascii-error-in-python/30551550#30551550, add the encoding declaration and your code should run once you encode – Padraic Cunningham Jul 02 '15 at 23:17

1 Answers1

1

There are several errors in your code.

To fix SyntaxError: Non-ASCII character '\xe4' in file, add the encoding declaration at the top: # -*- coding: utf-8 -*-. The error just means that somewhere in the source code you've used a literal non-ascii character (even if you did it in a comment).

The next error is the incorrect csv usage. writerow() accepts a row -- a sequence of items. You should not pass it a bytestring that is a sequence of bytes unless you want each column to be a single byte (you probably don't):

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import csv

text = u'IP专用耳机.黑'
with open("why_unicode.csv", "wb") as file:
    writer = csv.writer(file)
    for i in range(3):
        writer.writerow([text.encode('utf-8'), i]) 

Note: you don't need codecs to write bytes.

To avoid encoding each Unicode string manually, see UnicodeWriter example in csv docs.

jfs
  • 399,953
  • 195
  • 994
  • 1,670