-1

I created a function(parse_html(param)) that returns a list like below,

list = [u'John', u'Muchia', u'Prozessoptimierung Fahrwiderst\xe4nde']

if I return print list[2], and in my function, it gives me Prozessoptimierung Fahrwiderstände which is perfect, but it appears differently when in a list

The problem lies when I return the whole list return list I want to avoid the 'u'. I want to store a list of strings and the Unicode characters like ä ö and ü should also appear.

fname[x] is the source of the HTML file where x is the file number which is incremented from 0 to count(file_number)

list=[]
newlist=[]    
list = parse_html(fname[7])
for row in list:
  drow = row.encode('utf-8')
  newlist.append(drow)
print newlist

The goal is to save the returned list to a CSV file. Everytime a new file(fname) is selected, the list is created and should add the new list to the csv file previously created.

I am doing something really wrong and I can realize that and my head hurts. Please help.

update:

for x in range(0,count):
    list = parse_html(fname[x])
    with open('output.csv', 'wb') as myfile:
        wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
        wr.writerow(list)

error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 132: ordinal not in range(1
28)

Answer:

wr.writerow([c.encode('utf-8') for c in list]) # instead `wr.writerow(list)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
jorzylicious
  • 101
  • 2
  • 11

2 Answers2

0

The u prefix merely indicates that the string is in Unicode format. There's nothing wrong with your code, and it will behave correctly (as if it didn't have the u) in code. It's only output in the print function to let you know it's a Unicode string.

0

The problem is in your CSV output code. Since you are using Python 2, you should encode directly to utf-8 before writing:

    wr.writerow([c.encode('utf-8') for c in list])

Or, upgrade to Python 3 for more integrated unicode support.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 30: ordinal not in range(128) – jorzylicious Nov 16 '17 at 10:05
  • i included import sys reload(sys) sys.setdefaultencoding('utf-8') it gives an csv output wr.writerow([c.decode('utf-8') for c in list]) thanks for the help Daniel. – jorzylicious Nov 16 '17 at 10:11