0

I have researched this problem and found out that Pandas' explode function does not work on multiple columns, however, I have seen a few questions submitted on StackOverflow however, none of them seem to work for me.

Dataset:

j = { 
    "_id" : "5c45", 
    "user" : 5, 
    "ids" : [
        "1019", 
        "1021", 
        "1162"
    ], 
    "roles" : ["2d7f"]
}

Current Script:

root = json_normalize(j)
x = (root.applymap(type) == list).all()
y = x.index[x].tolist()
root = root.apply(lambda x: [str(v).split(',') for v in x]).apply(pd.Series.explode)

print(root)

I tried this solution here, but I get a value error:

ValueError: cannot reindex from a duplicate axis

Expected Result:

_id,user,ids,roles
5c45,5,1019,2d7f
5c45,5,1021,2d7f
5c45,5,1162,2d7f

Is there a simple, yet effective workaround to this?

jcoke
  • 1,555
  • 1
  • 13
  • 27

1 Answers1

2

Try record_path and meta options:

pd.json_normalize(j, record_path=['ids'], meta=['_id','user','roles'])

Output:

      0   _id user roles
0  1019  5c45    5  2d7f
1  1021  5c45    5  2d7f
2  1162  5c45    5  2d7f

For a somewhat dynamic solution, try flatten the singletons:

pd.DataFrame({k:v[0] if isinstance(v, list) and len(v) == 1 else v
              for k,v in j.items()
             })
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • is there a dynamic version of this, without explicitly specifying the column names, as it may differ from each json object? – jcoke Feb 16 '21 at 16:15