1

I am continuously getting this type of error when writing parsed content to a csv in python 2.7: UnicodeEncodeError: 'ascii' codec can't encode characters in position 570-579: ordinal not in range(128)

So after some research I found the examples in the docs as well as a similar Question on SO: Read and Write CSV files including unicode with Python 2.7 but I couldn't get mine to work with the following code:

    data = {
        'scrapeUrl': url,
        'model': final_model_num,
        'title': final_name, 
        'description': final_description, 
        'price': str(final_price), 
        'image': final_first_image, 
        'additional_image': final_images,
        'quantity': '1', 
        'subtract': '1', 
        'minimum': '1', 
        'status': '1', 
        'shipping': '1' 
    } 
    with open("local/file1.csv", "w") as f:
        writer=csv.writer(f, delimiter=",")
        writer.writerows([data.keys()])
        for row in zip(*data.values()):
            row=[s.encode('utf-8') for s in row]
            writer.writerows([row])

My version seems to be writing only the first character of each variable to each row; I tried removing the unzip key as a bit of troubleshooting but that resulted in all of the data being printed correctly but to one column of the csv rather than one row.

Community
  • 1
  • 1
jcuwaz
  • 187
  • 3
  • 14

1 Answers1

3

What you essentially have is set of key-value pairs, so you essentially have only one set of values, which ends up being decomposed into individual character 'rows' when zip was called:

>>> zip(*['abc', 'def', 'ghi'])
[('a', 'd', 'g'), ('b', 'e', 'h'), ('c', 'f', 'i')]

Furthermore, the shortest value in your example has a len of 1, which would then explain why you got a single row with just the first character of all the values as your output.

What you want to do is something like

    with open("local/file1.csv", "w") as f:
        writer = csv.writer(f, delimiter=",")
        writer.writerow(data.keys())
        writer.writerow([s.encode('utf8') for s in data.values()])

Alternatively, use codecs.open with an encoding to get around having to manually decode unicode into str.

    with codecs.open("local/file1.csv", "w", encoding='utf8') as f:
        writer = csv.writer(f, delimiter=",")
        writer.writerow(data.keys())
        writer.writerow(data.values())
metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • Thanks for quick response; ideally i would like to ignore errors entirely and keep the ascii encoding. I tried your last option and the same error persits: with codecs.open("local/file1.csv", "w", encoding='ascii', errors ='ignore') as f: writer = csv.writer(f, delimiter=",") writer.writerows([data.keys()]) writer.writerows([data.values()]) – jcuwaz May 10 '14 at 03:01
  • The original version with the edit you gave still causes the problem the difference now is that all the text is visible but still only on one row per letter: with open("local/file1.csv", "w") as f: writer=csv.writer(f, delimiter=",") writer.writerows([data.keys()]) writer.writerows([s.encode('ascii', 'ignore') for s in data.values()]) – jcuwaz May 10 '14 at 03:08
  • Whoops, should have used `writerow` for a singular row. Fixed the answer. – metatoaster May 10 '14 at 03:18
  • One last quick question meta; this works but it seems to be rewriting the first line of the CSV for every iteration of the program. I am running the same scrape method on a set of urls using the threaded module like so: threaded(urls, scrape_csv, num_threads=5) ; how can I get the csv writer to add a new line for each url? – jcuwaz May 10 '14 at 11:09