0

I'm trying to convert a nested JSON file to a dataframe. The dataframe now looks like this:

product shelfId
{'product': [{'id': '111', 'quantity': 10}]} aaa
{'product': [{'id': '222', 'quantity': 0}]} bbb

Now I want to flatten the product column, and get a df like:

productId productQuantity shelfId
111 10 aaa
222 0 bbb

Is there a way to do it?

Erin L
  • 95
  • 1
  • 5
  • what is the output of `type(df.loc[0,'product'])`? – Anurag Dabas Apr 22 '21 at 18:34
  • to be pedantic, that is not a JSON object. That is a Python `dict` – juanpa.arrivillaga Apr 22 '21 at 18:34
  • Doesn't this answer your question? https://stackoverflow.com/questions/46131731/split-dictionary-into-individual-columns-in-a-df – sander Apr 22 '21 at 18:36
  • ok, I see...Thanks! – Erin L Apr 22 '21 at 18:38
  • I tried the other methods, df = pd.json_normalize(df['product']) and df = pd.DataFrame(df['product'].values.tolist(), index=df.index). But encountered another problem. I got empty dicts within the product column, something like: {'product': []}. And when splitting up, it will show 'list' object has no attribute 'values'. Is there a way to solve this? (sorry a bit new to pandas) – Erin L Apr 22 '21 at 19:33

1 Answers1

0

I would apply a custom function to create a list of dict then convert to dataframe and add to old df.

def flatten(d):
    newd = {}
    for k, v in d.items():
        for ik, iv in v.items():
            newd[f'{k}{ik.title()}'] = iv
    return newd

cols = pd.DataFrame([flatten(v) for v in df['product']])
df = df.concat([df, cols]).drop(columns=['product'])
Eric Truett
  • 2,970
  • 1
  • 16
  • 21