How do I properly unwrap this json response in Python to get the data I need into a Pandas DataFrame?

Question

So, I'm working with Cisco Prime Infrastructure's API and I'm at the end of my rope. The script I'm writing is supposed to take a report from its getReport operation, decode it (not sure if that's the right word), and put it into a Pandas DataFrame. The problem is, there's junk wrapped around it. Here's an example of the response json:

{
  "mgmtResponse" : {
    "@requestUrl" : "https://localhost/webacs/api/v4/op/reportService/getReport?reportTitle=MyReport",
    "@responseType" : "operation",
    "@rootUrl" : "https://localhost/webacs/api/v4/op",
    "reportDataDTO" : [ {
      "childReports" : {
        "childReport" : [ ]
      },
      "dataRows" : {
        "dataRow" : [ {
          "entries" : {
            "entry" : [ {
              "attributeName" : "String value",
              "dataValue" : "String value",
              "displayName" : "String value"
            }, {
              "attributeName" : "Another string value",
              "dataValue" : "Another string value",
              "displayName" : "Another string value"
            } ]
          }
        }, {
          "entries" : {
            "entry" : [ {
              "attributeName" : "String value",
              "dataValue" : "String value",
              "displayName" : "String value"
            }, {
              "attributeName" : "Another string value",
              "dataValue" : "Another string value",
              "displayName" : "Another string value"
            } ]
          }
        } ]
      },
      "descriptorName" : "String value",
      "pageCount" : 15,
      "pageIndex" : 15,
      "reportDate" : "String value",
      "reportName" : "String value",
      "reportTitle" : "String value",
      "reportUrl" : "String value"
    } ]
  }
}

I want my script to only use the information nested under "dataRows", but I can't figure out what to do. So far I have this:

response = rq.get(url, auth=(cpi_user,cpi_password), verify=False, timeout = 300)
    print(response.status_code)

    if (response.status_code == rq.codes.ok):
        responseJSON = response.json()
        rogue_ap_flatten = json_normalize(responseJSON)
        print (rogue_ap_flatten)
        rogues = pd.DataFrame(rogue_ap_flatten)
        print(rogues.head(50))
        return rogues

And all I get in return is:

                               mgmtResponse.@requestUrl  ...                         mgmtResponse.reportDataDTO
    0  https://prime/webacs/api/v4/op/reportS...  ...  [{'childReports': {'childReport': []}, 'dataRo...

[1 rows x 4 columns]

I've tried using just the .text method from requests, I've tried using another json flattening library (json_flatten) with the option to exclude certain keys, and I'm considering just using sed in python somehow. It doesn't need to work for other reports, just one, so I have some leeway to specify any specific keys or whatnot. How would y'all tackle this problem?

Maybe this will help: https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe — w8eight, Jan 09 '20 at 14:53

score 0 · Answer 1 · answered Jan 09 '20 at 14:57

You could try something like this. It just seems you are a bit confused by the format of your json response, I tried to address the structure I see below, but just take a careful look at how everything is nested.

import requests
import pandas as pd
df = pd.DataFrame()
req_session = requests.Session()
response = req_session.get('<url>')
df.append(response.json()['mgmtResponse']['reportDataDTO'][0]['dataRows'])

score 0 · Accepted Answer · answered Jan 09 '20 at 14:57

You should be able to get to just the dataRows stuff using something like this:

import json
data = {<your data>}
print(json.dumps(result['mgmtResponse']['reportDataDTO'][0]['dataRows'], indent=4))

Output

{
    "dataRow": [
        {
            "entries": {
                "entry": [
                    {
                        "attributeName": "String value",
                        "dataValue": "String value",
                        "displayName": "String value"
                    },
                    {
                        "attributeName": "Another string value",
                        "dataValue": "Another string value",
                        "displayName": "Another string value"
                    }
                ]
            }
        },
        {
            "entries": {
                "entry": [
                    {
                        "attributeName": "String value",
                        "dataValue": "String value",
                        "displayName": "String value"
                    },
                    {
                        "attributeName": "Another string value",
                        "dataValue": "Another string value",
                        "displayName": "Another string value"
                    }
                ]
            }
        }
    ]
}

score 0 · Answer 3 · answered Jan 09 '20 at 15:21

my assumption that is you want to build a panda DF based on the incoming values. json.loads could help because "object_hook" can be used to provide custom deserializations.

import pandas as pd
import numpy as np
import json


def filter_cols(element):
    # deal with each dataRow and build your the structure that will get into the data frame
    return {
        element['attributeName']: element['dataValue'],
    }



if __name__ == '__main__':
    # from api response
    # content = json.loads(response.text)

    # doing this because I have your data in a file
    with open("data.json", "r") as read_file:
        content = json.load(read_file)

    f = pd.DataFrame.from_dict(
        json.loads(json.dumps(content['mgmtResponse']['reportDataDTO'][0]['dataRows']), object_hook=filter_cols))

score 0 · Answer 4 · answered Jan 09 '20 at 15:28

If response has only one format then proposed previously solution result['mgmtResponse']['reportDataDTO'][0]['dataRows'] would suffice. But if your responses differ but you could establish some rules like e.g. response json contains useful dataRows data only in nested dict's or lists then you could write some util method to recursively search for it.

Example code:

def search_for_dataRows(data):
    if "dataRows" in data.keys():
        return data["dataRows"]
    else:
        result = None
        for value in data.values():
            if isinstance(value, dict):
                result = search_for_dataRows(value)
            elif isinstance(value, list):
                for list_item in value:
                    if isinstance(list_item, dict):
                        result = search_for_dataRows(list_item)
            if result:
                return result
        return None

response_json = response.json()
search_result = search_for_dataRows(response_json)

How do I properly unwrap this json response in Python to get the data I need into a Pandas DataFrame?

4 Answers4

Linked