0

I have JSON data like

{'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@2'}]}}]}

I would like to extract three columns (Author, Marketing, and Media) and turn it into data like below:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

Thank you for any help in advance!

AMC
  • 2,642
  • 7
  • 13
  • 35
Kasu
  • 39
  • 1
  • 7
  • What is the issue, exactly? Have you tried anything, done any research? Please see [ask], [help/on-topic]. – AMC Apr 26 '20 at 02:14
  • Does this answer your question? [JSON to pandas DataFrame](https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe) – AMC Apr 26 '20 at 02:17

2 Answers2

1

Maybe you should explicitly flatten the nested JSON data you post.

JSON Structure:

{
  "Author": [
    {
      "name": "John",
      "Agency": {
        "Marketing": [
          {
            "name": "SD_SM_14"
          }
        ],
        "Media": [
          {
            "codeX": "s_wse@2"
          }
        ]
      }
    }
  ]
}

What you want:

Author  Marketing  Media
John    SD_SM_14   s_wse@2

Here is the code:

import pandas as pd
from typing import Dict

def flatten(data: Dict):
    for key, value in data.items():
        for res in value:
            # assume that there is only one key in `res`
            yield key, next(iter(res.values()))

def func(data: Dict):
    for author in data['Author']:
        name = author['name']
        agency = author['Agency']
        yield dict([('Author', name)] + list(flatten(agency)))

df = pd.DataFrame(func(data))
ZeFeng Zhu
  • 124
  • 1
  • 5
  • thank you, your code works too. How to tell there is only one key? – Kasu Apr 26 '20 at 23:56
  • @kasu Since your desired output uses the corresponding value of the key `"name"` as the value of `"Marketing"`, I assumed that there is only one key("name") in the Dict. So does `"Media"`. Besides, if there is more than one element in the list of `"Marketing"`, this code has to be modified. – ZeFeng Zhu Apr 27 '20 at 01:04
-2

I could not find a better way, but once solution can be to first extract name column from the Author and the explode the lists so that you have json when again you use json_normalize to extract the required columns:

In [38]: dic = {'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@
    ...: 2'}]}}]}

In [39]: df = pd.DataFrame(dic)

In [40]: df
Out[40]:
                                              Author
0  {'name': 'John', 'Agency': {'Marketing': [{'na...

In [41]: df = pd.json_normalize(df.Author)

In [42]: df
Out[42]:
   name        Agency.Marketing            Agency.Media
0  John  [{'name': 'SD_SM_14'}]  [{'codeX': 's_wse@2'}]

In [43]: df1 = df.explode('Agency.Marketing')

In [44]: df1
Out[44]:
   name      Agency.Marketing            Agency.Media
0  John  {'name': 'SD_SM_14'}  [{'codeX': 's_wse@2'}]

In [45]: df1 = df1.explode('Agency.Media')
In [47]: df2 = pd.json_normalize(df1['Agency.Marketing'])

In [48]: df2
Out[48]:
       name
0  SD_SM_14

In [49]: df3 = pd.json_normalize(df1['Agency.Media'])

In [50]: df3
Out[50]:
     codeX
0  s_wse@2

In [51]: main_df = pd.concat([df1,df2,df3], axis=1)

In [52]: main_df
Out[52]:
   name      Agency.Marketing          Agency.Media      name    codeX
0  John  {'name': 'SD_SM_14'}  {'codeX': 's_wse@2'}  SD_SM_14  s_wse@2

In [53]: main_df.drop(['Agency.Marketing','Agency.Media'],inplace=True,axis=1)

In [54]: main_df
Out[54]:
   name      name    codeX
0  John  SD_SM_14  s_wse@2

Update:

if you have imported the json_normalize method:

just use json_normalize instead of pd.json_normalize

Rajat Mishra
  • 3,635
  • 4
  • 27
  • 41