1

I am referring to the solution mentioned in this post.

I am trying to convert deeply nested JSON into pandas dataframe

import pandas as pd

records = []

for item in d["Data"]:
    records.append(
        {
            "Type": item["CoinInfo"]["Type"],
            "Name": item["CoinInfo"]["Name"],
            "SUPPLY": item["RAW"]["USD"]["SUPPLY"],
        }
    )


df = pd.DataFrame.from_records(records)

df.head()

This is helping me to parse the data from dictionary however gives error when the item does not exist. Is there a way to modify this code to return null if the item does not exist? Please help

Karthik Viz
  • 115
  • 13

4 Answers4

0

You're looking for https://docs.python.org/3/library/stdtypes.html#dict.get:

d = {}
x = d.get("non_existing_key", default=None)
assert x is None

If items might be missing on all nested levels, use this helper:

def nested_get(d, *keys, default=None):
    value = d
    try:
        for key in keys:
            value = value[key]
    except KeyError:
        return default
    return value


d = {"key": {}}
x = nested_get(d, "key", "missing_nested_key", default=None)
assert x is None
RafalS
  • 5,834
  • 1
  • 20
  • 25
0

Try to use get, which returns the second parameter if the first doesn't exist as a key in your dictionary.

Keep in mind that you can't use get on None, so if you want to access a nested key, and you are not sure whether the root level key exists, provide an empty dict ({}) as a second parameter in order to safely get the inner key.

Here is an example:

import pandas as pd

records = []

for item in d["Data"]:
    usd = item.get("RAW",{}).get("USD",{})
    supply = usd.get("SUPPLY") if type(usd) == dict else usd
    records.append(
        {
            "Type": item.get("CoinInfo",{}).get("Type"),
            "Name": item.get("CoinInfo",{}).get("Name"),
            "SUPPLY": supply
        }
    )


df = pd.DataFrame.from_records(records)

df.head()
Gabio
  • 9,126
  • 3
  • 12
  • 32
  • Thank you Gabip. It is working fine for most of the items however getting an error for some as "AttributeError: 'str' object has no attribute 'get'". Ay advise? – Karthik Viz Apr 19 '20 at 10:37
  • You can use `get` only for dict. The error occurs because you are apply the `get` function on string instead of dict. For example, `item.get("RAW",{}).get("USD",{}).get("SUPPLY")` if `USD` key has a string value, the code will throw because you can't run `get("SUPPLY")` on it (only on dicts) – Gabio Apr 19 '20 at 10:41
  • Thank you for your response.. how can I handle it if the USD has a string instead of dict, any advise? – Karthik Viz Apr 19 '20 at 13:03
  • how do you want to get the SUPPLY key out of string? – Gabio Apr 19 '20 at 13:10
  • So supply is a key that will have value for some of the records.. where ever exists.. I would like to get that into the data frame.. – Karthik Viz Apr 19 '20 at 13:11
  • so sometimes USD will be dict and sometimes it's string? And if it is dict you want to extract the SUPPLY, otherwise you want to return the USD. Am I right? – Gabio Apr 19 '20 at 13:18
  • That is correct.. – Karthik Viz Apr 19 '20 at 13:46
  • @KarthikViz I've updated my solution, please try now. – Gabio Apr 19 '20 at 14:00
0

Handling the exception with try ... except will solve the issue.

The below program initializes each record with default values of 'None'.

The value will be updated inside the for loop, only if there is no exception (I.E., only if the key is found)

Here is the working example with the exception handling implemented:

# File name: CryptoCompare.py

import pandas as pd
import requests as requests

def get_cryptocompare_data():
    url = "https://min-api.cryptocompare.com/data/top/mktcapfull?limit=15&tsym=USD"
    data = requests.get(url)
    d = data.json()
    records = []
    for item in d["Data"]:

        # Initialize a record with null values
        record = {"Type": None, "Name": None, "Supply": None}

        try:

            # Update values for the available keys
            record["Type"] = item["CoinInfo"]["Type"]
            record["Name"] = item["CoinInfo"]["Name"]
            record["Supply"] = item["RAW"]["USD"]["SUPPLY"]

            # Try a key that does not exist
            # record["Supply"] = item["abc"]["ijk"]["xyz"]

        except KeyError:
            # Handle the exception
            print("Key not found. Therefore, using the default 'None' value")

        # Append record to data set
        records.append(record)

    df = pd.DataFrame.from_records(records)
    # Show all records
    print(df)

get_cryptocompare_data()

# End of program

Output (all keys are found):

> python CryptoCompare.py

   Type  Name        Supply
0     1   BTC  1.833352e+07
1     1   ETH  1.105811e+08
2     1   XRP  9.999185e+10
3     1  GAPS  2.000000e+09
4     1   CRO  1.000000e+11
5     1  USDT  4.642367e+09
6     1   BCH  1.838416e+07
7     1   PLF  1.000000e+10
8     1  CTAG  4.000000e+09
9     1  LINK  1.000000e+09

Output (when a key is not found):

> python CryptoCompare.py

Key not found. Therefore, using the default 'None' value
...



    Type  Name Supply
0     1   BTC   None
1     1   ETH   None
2     1   XRP   None
3     1  GAPS   None
4     1   CRO   None
5     1  USDT   None
6     1   BCH   None
7     1   PLF   None
8     1  CTAG   None
9     1  LINK   None
Gopinath
  • 4,066
  • 1
  • 14
  • 16
  • 1
    Tried this.. but getting an error as 'list indices must be integers or slices, not str'. ANy thoughts? – Karthik Viz Apr 19 '20 at 12:46
  • Hi Karthik, the 'list indices must be integers' error is usually caused due to missing double quotes or unmatched quotations that result in a string being passed where a number is required. Can you share the complete error message, so that the root cause can be found? – Gopinath Apr 19 '20 at 13:10
0

I see you mentioned that json_normalize didn't work for you. Perhaps, I think following code should solve the problem:

data = pd.json_normalize(d["Data"])
cols = ["CoinInfo.Type", "CoinInfo.Name", "RAW.USD.SUPPLY"]
names = {"CoinInfo.Type": "Type", "CoinInfo.Name": "Name", "RAW.USD.SUPPLY": "SUPPLY"}

df = data[cols].rename(names, axis=1)

As per my understanding this also has performance benefits when you're working with larger dataset.

Pushkar Nimkar
  • 394
  • 3
  • 11