how to get the data from a json format by using Pandas?

Question

I got json data like this:

     "data": [
        {
          "currency": "CNY",
          "standardDate": "2021-03-31T00:00:00+08:00",
          "reportDate": "2021-04-26T00:00:00+08:00",
          "reportType": "first_quarterly_report",
          "q": {
            "ps": {
              "np": {
                "t": 773183901
              }
            }
          },
          "date": "2021-03-31T00:00:00+08:00",
          "stockCode": "300413"
        }
      ],
    "data":["Like Above"] many times

when I use pd.DataFrame to get a df, the column 'q' value shows this:

     q
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}}

how can I only get the number cleverly in the 'q' column ?

perhaps `df['q'] = df['q'].apply(lambda d: d['ps']['np']['t'])`? — Gusti Adli, May 19 '21 at 06:56
yes! it works. thank you very much. by the way , why use "d" rather than "x"? where can i find some thing to study the function apply? — Youyou Hua, May 19 '21 at 07:01
You don't need to use .apply() with lambda function for this case, just use `.str[]`. See details below. — SeaBean, May 19 '21 at 08:12
No particular reason for using `"d"` over `"x"`, you can use `"x"` if you prefer so. There's a lot of reference for studying the `.apply()` method, but I would recommend this article from [datacamp](https://www.datacamp.com/community/tutorials/pandas-apply) — Gusti Adli, May 19 '21 at 10:25
Try not to use `.apply()` if there is other built-in Pandas function that you can use. `.apply()` especially with lambda function is not vectorized and not optimized for performance like other built-in pandas / numpy functions. Hence, it is often slow. See [this post](https://stackoverflow.com/q/54432583/15070697) for details. — SeaBean, May 19 '21 at 10:32

score 2 · Answer 1 · answered May 19 '21 at 07:40

2

The best way is to get the desired columns

df1 = pd.json_normalize(data, record_path=['q','ps','np'],  \
    meta=['currency','standardDate','reportDate','<All required fields>'])

answered May 19 '21 at 07:40

Pawan Jain

815
3
15

score 1 · Accepted Answer · answered May 19 '21 at 08:10

A more direct way of accessing the JSON / dict value by name is to use the .str[] accessor. Like this using column q to access value of t deep down the nested JSON:

df['q'] = df['q'].str['ps'].str['np'].str['t']

Demo

data = {'q':[
     {'ps': {'np': {'t': 773183901}}}, 
     {'ps': {'np': {'t': 773183902}}}, 
     {'ps': {'np': {'t': 773183903}}}, 
     {'ps': {'np': {'t': 773183904}}}]}

df = pd.DataFrame(data)

print(df)

                                  q
0  {'ps': {'np': {'t': 773183901}}}
1  {'ps': {'np': {'t': 773183902}}}
2  {'ps': {'np': {'t': 773183903}}}
3  {'ps': {'np': {'t': 773183904}}}


df['q'] = df['q'].str['ps'].str['np'].str['t']

print(df)

           q
0  773183901
1  773183902
2  773183903
3  773183904

how to get the data from a json format by using Pandas?

2 Answers2

Demo