python encoding for huge volumes of data

Question

i have a huge amount of jsondata that i need to transfer to excel(10,000 or so rows and 20ish columns) Im using csv.my code:

x = json.load(urllib2.urlopen('#####'))
f = csv.writer(codecs.open("fsbmigrate3.csv", "wb+", encoding='utf-8'))
y = #my headers
f.writerow(y)
for row in x:
    f.writerow(row.values())

unicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128) is what comes up.

i have tried encoding the json data dict((k.encode('utf-8'), v.encode('utf-8')) for (k,v) in x) but there is too much data to handle.

any ideas on how to pull this off, (apologies for the lack of SO convention its my first post

the full traceback is; Traceback (most recent call last): File "C:\Users\bryand\Desktop\bbsports_stuff\gba2.py", line 22, in <module> f.writerow(row.values()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 0: ordinal not in range(128) [Finished in 6.2s]

i should add that the code works and does create a file and write data until the point where an unrecognizable character comes up. — niall downey, Aug 18 '15 at 13:40
possible duplicate of [UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)](http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20) — Peter Wood, Aug 18 '15 at 13:46
the data contains user info that needs to be transfered from on piece of software that we will be no longer using to another. basically its just what iv been told to do — niall downey, Aug 18 '15 at 13:51
Why are you trying to encode keys of `x`? You only need the values `vals = [v.encode('utf-8') for v in row.values()]` - and you can do it on each row loop iteration. Not sure if that helps enough to make it work...worth a shot. — minboost, Aug 18 '15 at 15:13
what i'v done is just write it to a text file and then open it with excel as a csv, not solving my issue from a programming perspective but it seemed to work — niall downey, Aug 19 '15 at 12:56

score 0 · Answer 1 · answered Aug 19 '15 at 20:03

Since you didn't specify here's a Python 3 solution. The Python 2 solution is much more painful. I've included some short sample data with non-ASCII characters:

#!python3
import json
import csv

json_data = '[{"a": "\\u9a6c\\u514b", "c": "somethingelse", "b": "something"}, {"a": "one", "c": "three", "b": "two"}]'
data = json.loads(json_data)

with open('fsbmigrate3.csv','w',encoding='utf-8-sig',newline='') as f:
    w = csv.DictWriter(f,fieldnames=sorted(data[0].keys()))
    w.writeheader()
    w.writerows(data)

The utf-8-sig codec makes sure a byte order mark character (BOM) is written at the start of the output file, since Excel will assume the local ANSI encoding otherwise.

Since you have json data with key/value pairs, using DictWriter allows the headers to be specified; otherwise, the header order isn't predictable.

python encoding for huge volumes of data

1 Answers1