1

I got json data like this:

     "data": [
        {
          "currency": "CNY",
          "standardDate": "2021-03-31T00:00:00+08:00",
          "reportDate": "2021-04-26T00:00:00+08:00",
          "reportType": "first_quarterly_report",
          "q": {
            "ps": {
              "np": {
                "t": 773183901
              }
            }
          },
          "date": "2021-03-31T00:00:00+08:00",
          "stockCode": "300413"
        }
      ],
    "data":["Like Above"] many times

when I use pd.DataFrame to get a df, the column 'q' value shows this:

     q
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}} 
     {'ps': {'np': {'t': 773183901}}} 

how can I only get the number cleverly in the 'q' column ?

    q
    773183901
    773183901
    773183901
    773183901
dejanualex
  • 3,872
  • 6
  • 22
  • 37
Youyou Hua
  • 49
  • 5
  • 2
    perhaps `df['q'] = df['q'].apply(lambda d: d['ps']['np']['t'])`? – Gusti Adli May 19 '21 at 06:56
  • yes! it works. thank you very much. by the way , why use "d" rather than "x"? where can i find some thing to study the function apply? – Youyou Hua May 19 '21 at 07:01
  • You don't need to use .apply() with lambda function for this case, just use `.str[]`. See details below. – SeaBean May 19 '21 at 08:12
  • No particular reason for using `"d"` over `"x"`, you can use `"x"` if you prefer so. There's a lot of reference for studying the `.apply()` method, but I would recommend this article from [datacamp](https://www.datacamp.com/community/tutorials/pandas-apply) – Gusti Adli May 19 '21 at 10:25
  • Try not to use `.apply()` if there is other built-in Pandas function that you can use. `.apply()` especially with lambda function is not vectorized and not optimized for performance like other built-in pandas / numpy functions. Hence, it is often slow. See [this post](https://stackoverflow.com/q/54432583/15070697) for details. – SeaBean May 19 '21 at 10:32

2 Answers2

2

The best way is to get the desired columns

df1 = pd.json_normalize(data, record_path=['q','ps','np'],  \
    meta=['currency','standardDate','reportDate','<All required fields>'])
Pawan Jain
  • 815
  • 3
  • 15
1

A more direct way of accessing the JSON / dict value by name is to use the .str[] accessor. Like this using column q to access value of t deep down the nested JSON:

df['q'] = df['q'].str['ps'].str['np'].str['t']

Demo

data = {'q':[
     {'ps': {'np': {'t': 773183901}}}, 
     {'ps': {'np': {'t': 773183902}}}, 
     {'ps': {'np': {'t': 773183903}}}, 
     {'ps': {'np': {'t': 773183904}}}]}

df = pd.DataFrame(data)

print(df)

                                  q
0  {'ps': {'np': {'t': 773183901}}}
1  {'ps': {'np': {'t': 773183902}}}
2  {'ps': {'np': {'t': 773183903}}}
3  {'ps': {'np': {'t': 773183904}}}


df['q'] = df['q'].str['ps'].str['np'].str['t']

print(df)

           q
0  773183901
1  773183902
2  773183903
3  773183904
SeaBean
  • 22,547
  • 3
  • 13
  • 25