0

I have the following json that I extracted using request with python and json.loads. The whole json basically repeats itself with changes in the ID and names. It has a lot of information but I`m just posting a small sample as an example:

"status":"OK",
   "statuscode":200,
   "message":"success",
   "apps":[
      {
         "id":"675832210",
         "title":"AGED",
         "desc":"No annoying ads&easy to play",
         "urlImg":"https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
         "urlImgWide":"https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
         "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
         "androidPackage":"com.agedstudio.freecell",
         "revenueType":"cpi",
         "revenueRate":"0.10",
         "categories":"Card",
         "idx":"2",
         "country":[
            "CH"
         ],
         "cityInclude":[
            "ALL"
         ],
         "cityExclude":[
            
         ],
         "targetedOSver":"ALL",
         "targetedDevices":"ALL",
         "bannerId":"675832210",
         "campaignId":"495181210",
         "campaignType":"network",
         "supportedVersion":"",
         "storeRating":"4.3",
         "storeDownloads":"10000+",
         "appSize":"34603008",
         "urlVideo":"",
         "urlVideoHigh":"",
         "urlVideo30Sec":"https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
         "urlVideo30SecHigh":"https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
         "offerId":"5825774"
      },

I dont need all that data, just a few like 'title', 'country', 'revenuerate' and 'urlApp' but I dont know if there is a way to extract only that. My solution so far was to make the json a dataframe and then drop the columns, however, I wanted to find an easier solution.

My ideal final result would be to have a dataframe with selected keys and arrays Does anybody know an easy solution for this problem?

Thanks

  • `id = your_json['apps'][0]['id']` where `your_json` is your parsed json `using json.loads`, and then you access to attributes using the square brackets (note the [0] index since 'apps' is an array) – ozerodb May 28 '21 at 21:35

3 Answers3

1

I assume you have that data as a dictionary, let's call it json_data. You can just iterate over the apps and write them into a list. Alternatively, you could obviously also define a class and initialize objects of that class.

EDIT: I just found this answer: https://stackoverflow.com/a/20638258/6180150, which tells how you can convert a list of dicts like from my sample code into a dataframe. See below adaptions to the code for a solution.

json_data = {
    "status": "OK",
    "statuscode": 200,
    "message": "success",
    "apps": [
        {
            "id": "675832210",
            "title": "AGED",
            "desc": "No annoying ads&easy to play",
            "urlImg": "https://test.com/pImg.aspx?b=675832&z=1041813&c=495181&tid=API_MP&u=https%3a%2f%2fcdna.test.com%2fbanner%2fwMMUapCtmeXTIxw_square.png&q=",
            "urlImgWide": "https://cdna.test.com/banner/sI9MfGhqXKxVHGw_rectangular.jpeg",
            "urlApp": "https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
            "androidPackage": "com.agedstudio.freecell",
            "revenueType": "cpi",
            "revenueRate": "0.10",
            "categories": "Card",
            "idx": "2",
            "country": [
                "CH"
            ],
            "cityInclude": [
                "ALL"
            ],
            "cityExclude": [

            ],
            "targetedOSver": "ALL",
            "targetedDevices": "ALL",
            "bannerId": "675832210",
            "campaignId": "495181210",
            "campaignType": "network",
            "supportedVersion": "",
            "storeRating": "4.3",
            "storeDownloads": "10000+",
            "appSize": "34603008",
            "urlVideo": "",
            "urlVideoHigh": "",
            "urlVideo30Sec": "https://cdn.test.com/banner/video/video-675832-30.mp4?rnd=1620699136",
            "urlVideo30SecHigh": "https://cdn.test.com/banner/video/video-675832-30_o.mp4?rnd=1620699131",
            "offerId": "5825774"
        },
    ]
}

filtered_data = []
for app in json_data["apps"]:
    app_data = {
        "id": app["id"],
        "title": app["title"],
        "country": app["country"],
        "revenueRate": app["revenueRate"],
        "urlApp": app["urlApp"],
    }
    filtered_data.append(app_data)

print(filtered_data)

# Output
d = [
    {
        'id': '675832210',
        'title': 'AGED',
        'country': ['CH'],
        'revenueRate': '0.10',
        'urlApp': 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='
    }
]

d = pd.DataFrame(filtered_data)
print(d)

# Output
          id title country revenueRate urlApp
0  675832210  AGED    [CH]        0.10 https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=
schilli
  • 1,700
  • 1
  • 9
  • 17
  • thanks for your reply, that is very helpful but how can I put the result in a dataframe? Cause the result is no longer a json, is a text – Mariana Rodrigues May 28 '21 at 21:57
  • Sorry I'm not familiar with dataframes. Maybe there is a way to transform lists to dataframes, then you can just forward the list. I hope you find a good solution, maybe my idea helps :) – schilli May 28 '21 at 22:03
  • @MarianaRodrigues I've just updated my answer with a way to transform the list of dicts into a dataframe after filtering out the desired fields. – schilli May 28 '21 at 22:33
  • Awesome, thank you so much. In the case where the json is nested, like this: "payouts":{ "total":2, "entries":[ { "network_offer_payout_revenue_id":3217, "custom_payout_overwrite":false, "network_custom_payout_revenue_setting_id":0} How that for app... would work in this case? – Mariana Rodrigues May 29 '21 at 09:40
0

if your endgame is dataframe, just load the dataframe and take the columns you want:

setting the json to data

df = pd.json_normalize(data['apps'])

yields

          id title                          desc                                             urlImg  ... urlVideoHigh                                      urlVideo30Sec                                  urlVideo30SecHigh  offerId
0  675832210  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...  ...               https://cdn.test.com/banner/video/video-675832...  https://cdn.test.com/banner/video/video-675832...  5825774

[1 rows x 28 columns]

then if you want certain columns:

df_final = df[['title', 'desc', 'urlImg']]

  title                          desc                                             urlImg
0  AGED  No annoying ads&easy to play  https://test.com/pImg.aspx?b=675832&z=1041813&...
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14
  • Yes, that is what I`m doing already, but I`m looking for a solution to do before creating the dataframe. I have several Json structures similar to this one when I have to join all together so if I have a way to select only what is necessary before creating a df it would be easier – Mariana Rodrigues May 28 '21 at 21:59
  • why not collect all the jsons into a list, then read that into dataframe? or create individual dataframes and concatenate them? It's not necessarily easier to what you are asking. the answer by schilli is the approach your asking for but you still aren't getting what you want. you are going to have to do some sort of transformations or collections of the individual json structures. – Jonathan Leon May 28 '21 at 22:06
0

use a dictionary comprehension to extract a dictionary of key/value pairs you want

import json

json_string="""{
     "id":"675832210",
     "title":"AGED",
     "desc":"No annoying ads&easy to play",
     "urlApp":"https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q=",
     "revenueRate":"0.10",
     "categories":"Card",
     "idx":"2",
     "country":[
        "CH"
     ],
     "cityInclude":[
        "ALL"
     ],
     "cityExclude":[
        
     ]
  }"""

 json_dict = json.loads(json_string)

 filter_fields=['title','country','revenueRate','urlApp']    

 dict_result = { key: json_dict[key]  for key in json_dict if key in filter_fields}

 json_elements = []
 for key in dict_result:
     json_elements.append((key,json_dict[key]))

 print(json_elements)

output:

 [('title', 'AGED'), ('urlApp', 'https://admin.test.com/appLink.aspx?b=675832&e=1041813&tid=API_MP&sid=2c5cee038cd9449da35bc7b0f53cf60f&q='), ('revenueRate', '0.10'), ('country', ['CH'])]
Golden Lion
  • 3,840
  • 2
  • 26
  • 35