0

I first try to normalize the data:

df = pd.json_normalize(balance_sheet_data_qt)

then I have tried flattening it using this answer, How to flatten a pandas dataframe with some columns as json? but doesnt seem to do anything.

json_struct = json.loads(df .to_json(orient="records"))    
#df_flat = pd.json_normalize(json_struct) 

Also tried this How to read and normalize following json in pandas?, but had an issue on what columns to use assign on.

sample data before normalizing aka balanace_sheet_data_qt

   {'balanceSheetHistoryQuarterly': {'AAPL': [{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 'commonStock': 51744000000, 'otherCurrentAssets': 13687000000, 'retainedEarnings': 14301000000, 'otherLiab': 56042000000, 'treasuryStock': 179000000, 'otherAssets': 43270000000, 'cash': 36010000000, 'totalCurrentLiabilities': 132507000000, 'shortLongTermDebt': 7762000000, 'otherStockholderEquity': 179000000, 'propertyPlantEquipment': 37933000000, 'totalCurrentAssets': 154106000000, 'longTermInvestments': 118745000000, 'netTangibleAssets': 66224000000, 'shortTermInvestments': 40816000000, 'netReceivables': 58620000000, 'longTermDebt': 99281000000, 'inventory': 4973000000, 'accountsPayable': 63846000000}}, {'2020-09-26': {'totalLiab': 258549000000, 'totalStockholderEquity': 65339000000, 'otherCurrentLiab': 47867000000, 'totalAssets': 323888000000, 'commonStock': 50779000000, 'otherCurrentAssets': 11264000000, 'retainedEarnings': 14966000000, 'otherLiab': 46108000000, 'treasuryStock': -406000000, 'otherAssets': 33952000000, 'cash': 38016000000, 'totalCurrentLiabilities': 105392000000, 'shortLongTermDebt': 8773000000, 'otherStockholderEquity': -406000000, 'propertyPlantEquipment': 45336000000, 'totalCurrentAssets': 143713000000, 'longTermInvestments': 100887000000, 'netTangibleAssets': 65339000000, 'shortTermInvestments': 52927000000, 'netReceivables': 37445000000, 'longTermDebt': 98667000000, 'inventory': 4061000000, 'accountsPayable': 42296000000}}, {'2020-06-27': {'totalLiab': 245062000000, 'totalStockholderEquity': 72282000000, 'otherCurrentLiab': 39945000000, 'totalAssets': 317344000000, 'commonStock': 48696000000, 'otherCurrentAssets': 10987000000, 'retainedEarnings': 24136000000, 'otherLiab': 47606000000, 'treasuryStock': -550000000, 'otherAssets': 32836000000, 'cash': 33383000000, 'totalCurrentLiabilities': 95318000000, 'shortLongTermDebt': 7509000000, 'otherStockholderEquity': -550000000, 'propertyPlantEquipment': 43851000000, 'totalCurrentAssets': 140065000000, 'longTermInvestments': 100592000000, 'netTangibleAssets': 72282000000, 'shortTermInvestments': 59642000000, 'netReceivables': 32075000000, 'longTermDebt': 94048000000, 'inventory': 3978000000, 'accountsPayable': 35325000000}}, {'2020-03-28': {'totalLiab': 241975000000, 'totalStockholderEquity': 78425000000, 'otherCurrentLiab': 42048000000, 'totalAssets': 320400000000, 'commonStock': 48032000000, 'otherCurrentAssets': 15691000000, 'retainedEarnings': 33182000000, 'otherLiab': 48745000000, 'treasuryStock': -2789000000, 'otherAssets': 33868000000, 'cash': 40174000000, 'totalCurrentLiabilities': 96094000000, 'shortLongTermDebt': 10392000000, 'otherStockholderEquity': -2789000000, 'propertyPlantEquipment': 43986000000, 'totalCurrentAssets': 143753000000, 'longTermInvestments': 98793000000, 'netTangibleAssets': 78425000000, 'shortTermInvestments': 53877000000, 'netReceivables': 30677000000, 'longTermDebt': 89086000000, 'inventory': 3334000000, 'accountsPayable': 32421000000}}]}}

sample data after normalizaing.

,balanceSheetHistoryQuarterly.AAPL
0,"[{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 

list of columns I want:

'totalLiab'
 'totalStockholderEquity'
 'otherCurrentLiab'
 'totalAssets'
 'commonStock'
 'otherCurrentAssets'
 'retainedEarnings'
 'otherLiab'
 'treasuryStock'
 'otherAssets'
 'cash'
 'totalCurrentLiabilities'
 'shortLongTermDebt'
 'otherStockholderEquity'
 'propertyPlantEquipment'
 'totalCurrentAssets'
 'propertyPlantEquipment'
 'totalCurrentAssets'
 'longTermInvestments'
 'netTangibleAssets'
 'shortTermInvestments'
 'netReceivables'
 'longTermDebt'
 'inventory'
 'accountsPayable'

I am trying to put this into dataframe/tabular format. I think the first row balanceSheetHistoryQuarterly.AAPL or date column may be throwing it off.

Any help is appreciated.

excelguy
  • 1,574
  • 6
  • 33
  • 67

1 Answers1

1

As the structure of data is like below

{
    'balanceSheetHistoryQuarterly': {
        'AAPL': [
            {
                '2020-12-26': {
                    'totalLiab': 287830000000,
                    'totalStockholderEquity': 66224000000,
                    'otherCurrentLiab': 55899000000
                }
            }, {
                '2020-06-27':
                {
                    'totalLiab': 245062000000,
                    'totalStockholderEquity': 72282000000,
                    'otherCurrentLiab': 39945000000
                }
            }
        ]
    }
}

You have to create a new array and iterate through AAPL and add its values to your list

listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
    value = js.values()
    listInput += value
df = pd.json_normalize(listInput)

If you want to pass the date to dataframe, you have to create a new json each times you iterate through js.items() and js.keys()

listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
    for key,value in zip(js.keys(), js.values()):
        new_json = value.copy()
        new_json["date"] = key
        listInput.append(new_json)
df = pd.json_normalize(listInput)
Doan Van Thang
  • 989
  • 1
  • 10
  • 21