Unicode text in pandas dataframe cannot parse to JSON

Question

I'm trying write python code to build a nested JSON file from a flat table in a pandas data frame. I created a dictionary of dictionaries from the pandas dataframe. When I try to export the dict to JSON, I get an error that the unicode text is not a string. How can I convert dictionaries with unicode strings to JSON?

My current code is:

data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8')
columnList = tuple(data[0:])
    for index, row in data.iterrows():
    dataRow = tuple(row)
    rowDict = dict(zip(dataRow[2:],columnList[2:]))

    memberId = str(tuple(row[1]))
    teamName = str(tuple(row[0]))

    memberDict1 = {memberId[1:2]:rowDict}
    memberDict2 = {teamName:memberDict1}

This produces a dict of dicts like where each row looks like this:

'(1L,)': {'0': {(u'Doe',): (u'lastname',), (u'John',): (u'firstname',), (u'none',): (u'mobile',), (u'916-555-1234',): (u'phone',), (u'john.doe@wildlife.net',): (u'email',), (u'Anon',): (u'orgname',)}}}

But when I try to dump to JSON, the unicode text can't be parsed as strings, so I get this error:

TypeError: key (u'teamname',) is not a string

How can I convert my nested dicts to JSON without invoking the error?

Its not that I don't understand the message, its that I don't know if I'm even on the right track making a dict of dicts for export into the JSON format I need. If this is the right track, how can I convert my nested dicts without invoking the error? — spaine, Jun 09 '16 at 21:02
I've made some edits so hopefully this question is more concise and understandable now. — spaine, Jun 10 '16 at 20:06
Note that "teamname" is not present anywhere in your sample row. It would be helpful if the error and the sample data matched. — Paul Roub, Jun 10 '16 at 21:04
Also, note that `(u'teamname',)` isn't a string -- it's a tuple whose first *element* is a unicode string, which isn't going to work as a JSON key, regardless of character encoding. — Paul Roub, Jun 10 '16 at 21:09
As Paul said, you just have to reference index 0 of every "string" you have now, as they are not strings, but strings inside single-element tuples. Also, your indentation is off (look at the for loop). — Andras Deak -- Слава Україні, Jun 10 '16 at 21:13
Agree with @Paul here, You can clean it up by using a dict comprehension like `d['(1L,)']['0'] = {i[0]:v[0] for i,v in d['(1L,)']['0'].items()}`. Your final code would look like [this](https://ideone.com/8eF93V). [This](http://stackoverflow.com/q/12734517/4099593) *is* a related post to read. — Bhargav Rao, Jun 10 '16 at 21:33
@Andras I *think* the for loop is indented by an additional 4 spaces while pasting the code here. — Bhargav Rao, Jun 10 '16 at 21:35
@BhargavRao I never said that it's incomprehensible, only that it's off:P — Andras Deak -- Слава Україні, Jun 10 '16 at 21:38
'teamname' is the column heading of the first column in my dataframe. When I create lists (or tuples) from the dataframe rows, the items in the rows are added as (u'dataItem',) - like you said, a tuple whose first element is a unicode string. Why are they coming in as tuples, and how can I keep them as simple strings? — spaine, Jun 10 '16 at 21:45
@spaine the problem is not the `teamName` *variable*, it's the `(u'teamname',)` element *in `rowDict`*! Look at your definition of `rowDict`: you have stuff like `columnList[2:3]`. If `columnList` is a `tuple`, this is then equivalent to `tuple(columnList[2])`. What you need instead if simply `columnList[2]` etc, reference with single index. Your `teamName` variable is `str(tuple(...))`, and correspondingly it looks like this: `'(1L,)'`, which is again probably not what you expect, but a string none the less. — Andras Deak -- Слава Україні, Jun 10 '16 at 21:55
@Andras Thanks, I will simplify my indexes to one digit. For reference, this code is an attempt to solve the problem I posted here: http://stackoverflow.com/questions/37713329/use-python-to-build-nested-json-from-flat-table — spaine, Jun 10 '16 at 22:15
@spaine you edited your code, but you kept the error message. What's going on? Also, most of your work could be done by `rowDict = dict(zip(dataRow[2:],columnList[2:]))`. — Andras Deak -- Слава Україні, Jun 10 '16 at 22:27
I love it how you edited your question *yet again* based on my previous comment, but you entirely ignore the first half of the very same comment. — Andras Deak -- Слава Україні, Jun 10 '16 at 22:45
@Andras I've incorporated your rowDict - that's slick. I'm still getting the error message because I haven't figured out how to deal with the problem of strings being in unicode. Sorry, I'm very new to this and don't understand what I'm missing. — spaine, Jun 10 '16 at 22:48
If the error is still `TypeError: key (u'teamname',) is not a string` then the problem is *not* unicode. I don't expect unicode strings to cause any problems for you. And not with that error message! — Andras Deak -- Слава Україні, Jun 10 '16 at 22:49
OK, so unicode is not the issue. Am I even barking up the right tree trying to build dicts of dicts to create a nested JSON? — spaine, Jun 10 '16 at 23:16
You should probably ask this on your other question. I've never used JSON files, nor the corresponding libraries in python. But hey: you can just set up a simple test case for yourself by hand (call it a [MCVE]), and see what happens when you try to dump it into a JSON. — Andras Deak -- Слава Україні, Jun 10 '16 at 23:38

Unicode text in pandas dataframe cannot parse to JSON

0 Answers0