How to write UTF-8 in a CSV file

Question

I am trying to create a text file in csv format out of a PyQt4 QTableWidget. I want to write the text with a UTF-8 encoding because it contains special characters. I use following code:

import codecs
...
myfile = codecs.open(filename, 'w','utf-8')
...
f = result.table.item(i,c).text()
myfile.write(f+";")

It works until the cell contains a special character. I tried also with

myfile = open(filename, 'w')
...
f = unicode(result.table.item(i,c).text(), "utf-8")

But it also stops when a special character appears. I have no idea what I am doing wrong.

"it salso tops"? What does that mean? What error do you get? What is your input? — , Sep 12 '13 at 14:44
The input is a pyqt4 QTableWidgetItem. The problem is that i don't get any error because script is running as a plugin. — Martin, Sep 12 '13 at 14:48
Found the solution. I had to write `myfile.write(u"%s"&f+";")` — Martin, Sep 12 '13 at 15:11
See also: [How do I read and write CSV files with Python?](https://stackoverflow.com/a/41585079/562769) — Martin Thoma, Aug 21 '17 at 20:18

score 134 · Accepted Answer · answered May 21 '16 at 14:50

134

It's very simple for Python 3.x (docs).

import csv

with open('output_file_name', 'w', newline='', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file, delimiter=';')
    writer.writerow('my_utf8_string')

For Python 2.x, look here.

answered May 21 '16 at 14:50

Zanon

29,231
20
113
126

2

what if the content to `writerow` is not a utf-8? will it work? – CKM Jun 29 '18 at 11:43
1

Great no need for third party pip installs. – Vaibhav Vishal Jul 27 '19 at 11:33
i'm not using a file, i'm using `sys.stdout` so how the content can be utf8 in that case ? – Ricky Levi May 03 '22 at 12:45

the · Answer 2 · 2017-05-11T18:39:19.317

107

From your shell run:

pip2 install unicodecsv

And (unlike the original question) presuming you're using Python's built in csv module, turn
import csv into
import unicodecsv as csv in your code.

edited May 11 '17 at 18:39

answered Jul 26 '15 at 21:19

the

21,007
11
68
101

32

It didn't work just by replacing the import, I also had to add the encoding when creating the writer: `writer = csv.writer(out, dialect='excel', encoding='utf-8')`, and create the file handler with `open(...`, **not** `codecs.open(...`. – Suzana Feb 07 '16 at 18:46
4

I tried all suggestions on StackOverflow and only this one works for me. – Charles Chow Jun 14 '16 at 05:16

score 14 · Answer 3 · answered Mar 24 '14 at 11:07

14

Use this package, it just works: https://github.com/jdunck/python-unicodecsv.

answered Mar 24 '14 at 11:07

Gijs

10,346
5
27
38

score 7 · Answer 4 · answered Sep 27 '17 at 15:11

For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.

For example:

csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

works, while:

csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

will throw AttributeError: 'int' object has no attribute 'encode'.

As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:

def to_utf8(lst):
    return [unicode(elem).encode('utf-8') for elem in lst]

...
csv_writer.writerow(to_utf8(row))

Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.

much simpler solution for py2.x for those of us still stuck with using it. — khan, Nov 23 '20 at 17:21

score 2 · Answer 5 · answered Sep 12 '13 at 16:47

2

The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples

(can't copy the code here because it's protected by copyright)

answered Sep 12 '13 at 16:47

Aaron Digulla

321,842
108
597
820

1

Thanks for the link. It was helpful. For my knowledge, even if you have posted the link you can't copy paste the code here? (+1 for ownering the copyright) – Mutant Aug 20 '15 at 16:46
1

@Mutant: Code isn't like scientific papers. Code is protected by copyright. While I'm 99.999% sure that the Python owners wouldn't sue SO for copying their code, I didn't feel like reading their [lengthy license](https://docs.python.org/2/license.html#history-and-license) to find out whether it's allowed or not. Also, it's good to remind people once in a while that "I can see it on my monitor" != "I can do whatever I want with it" :-) – Aaron Digulla Aug 21 '15 at 08:02
1

Thanks for the reminder. Unfortunately the world we live in became so (unreasonably) fast and careless where information is flowing faster than one can imagine, it does require reminder once and while on the restriction that matters. Thanks for that :) – Mutant Aug 21 '15 at 16:49
3

The docs link is semi-useful (examples are better), but the "copyright" argument here is overblown and asinine. Python is explicitly open source ([v2](https://docs.python.org/2/license.html) [v3](https://docs.python.org/3/license.html)). The license is clear: "royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute ... [etc., etc.]" Even the simple phrase at the top of the page, "GPL-compatible" should give you comfort. Share open source stuff. Even modify it if you want to. It's open source for a reason. – alttag Nov 14 '17 at 20:44
@alttag Copying or using GPLd code in a project means that all the other code in the same project is now under GPL as well. Since I'm not a copyright lawyer, I don't know what that means with regards to code published on a web site. – Aaron Digulla Nov 28 '17 at 13:03

score 0 · Answer 6 · answered Jan 29 '19 at 11:11

For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings

def encode_rows_to_utf8(rows):
    encoded_rows = []
    for row in rows:
        encoded_row = []
        for value in row:
            if isinstance(value, basestring):
                value = unicode(value).encode("utf-8")
            encoded_row.append(value)
        encoded_rows.append(encoded_row)
    return encoded_rows

score 0 · Answer 7 · answered May 11 '22 at 09:52

I tried using Bojan's suggestion but it turned all the None cells into the word None rather than blank, and rendered floats as 1.231111111111111e+11, maybe other annoyances. Plus, I want my program to run under both Python3 and Python2. So, I ended up putting at the top of the program:

try:
    csv.writer(open(os.devnull, 'w')).writerow([u'\u03bc'])
    PREPROCESS = lambda array: array
except UnicodeEncodeError:
    logging.warning('csv module cannot handle unicode, patching...')
    PREPROCESS = lambda array: [
        item.encode('utf8')
        if hasattr(item, 'encode') else item
        for item in array
    ]

Then changed all csvout.writerow(row) statements to csvout.writerow(PREPROCESS(row))

I could have used the test if sys.version_info < (3,): instead of the try statement but that violates "duck typing". I may revisit it and write that first one-liner properly with with statements, to get rid of the dangling open file and writer, but then I'd have to use ALL_CAPS variable names or pylint would complain... it should get garbage collected anyway, and in any case only lasts while the script is running.

score -2 · Answer 8 · answered Jan 15 '17 at 13:38

A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:

    fd = codecs.open(tempfilename, 'wb', 'utf-8')  
    for c in whatever :
        fd.write( json.dumps(c) [1:-1] )   # json dumps writes ["a",..]
        fd.write('\n')
    fd.close()

Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*

How to write UTF-8 in a CSV file

8 Answers8

Linked

Related