Storing a list of unicode strings in a csv file python 2

Question

I created a function(parse_html(param)) that returns a list like below,

list = [u'John', u'Muchia', u'Prozessoptimierung Fahrwiderst\xe4nde']

if I return print list[2], and in my function, it gives me Prozessoptimierung Fahrwiderstände which is perfect, but it appears differently when in a list

The problem lies when I return the whole list return list I want to avoid the 'u'. I want to store a list of strings and the Unicode characters like ä ö and ü should also appear.

fname[x] is the source of the HTML file where x is the file number which is incremented from 0 to count(file_number)

list=[]
newlist=[]    
list = parse_html(fname[7])
for row in list:
  drow = row.encode('utf-8')
  newlist.append(drow)
print newlist

The goal is to save the returned list to a CSV file. Everytime a new file(fname) is selected, the list is created and should add the new list to the csv file previously created.

I am doing something really wrong and I can realize that and my head hurts. Please help.

update:

for x in range(0,count):
    list = parse_html(fname[x])
    with open('output.csv', 'wb') as myfile:
        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
        wr.writerow(list)

error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 132: ordinal not in range(1
28)

Answer:

wr.writerow([c.encode('utf-8') for c in list]) # instead `wr.writerow(list)

You aren't doing anything wrong, and you don't need to remove the `u`. Why do you think you do? — Daniel Roseman, Nov 16 '17 at 09:46
Hi Daniel, my goal is to save it in a csv. I updated the post a bit, and i think this u is generating a problem — jorzylicious, Nov 16 '17 at 09:56
@jorzylicious: Your only problem is that the Python 2 `csv` library is not that good at handling Unicode. See the duplicate. — Martijn Pieters, Nov 16 '17 at 10:02
No! [Why sys.sysdefaultencoding will break code](https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/). Learn to use Unicode correctly instead. — Mark Tolonen, Nov 16 '17 at 16:51

score 0 · Answer 1 · answered Nov 16 '17 at 09:51

0

The u prefix merely indicates that the string is in Unicode format. There's nothing wrong with your code, and it will behave correctly (as if it didn't have the u) in code. It's only output in the print function to let you know it's a Unicode string.

answered Nov 16 '17 at 09:51

Pedro von Hertwig Batista

2,922
1
15
20

score 0 · Answer 2 · answered Nov 16 '17 at 09:59

0

The problem is in your CSV output code. Since you are using Python 2, you should encode directly to utf-8 before writing:

    wr.writerow([c.encode('utf-8') for c in list])

Or, upgrade to Python 3 for more integrated unicode support.

answered Nov 16 '17 at 09:59

Daniel Roseman

588,541
66
880
895

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30: ordinal not in range(128) – jorzylicious Nov 16 '17 at 10:05
i included import sys reload(sys) sys.setdefaultencoding('utf-8') it gives an csv output wr.writerow([c.decode('utf-8') for c in list]) thanks for the help Daniel. – jorzylicious Nov 16 '17 at 10:11

Storing a list of unicode strings in a csv file python 2

update:

Answer:

2 Answers2