I first try to normalize the data:
df = pd.json_normalize(balance_sheet_data_qt)
then I have tried flattening it using this answer, How to flatten a pandas dataframe with some columns as json? but doesnt seem to do anything.
json_struct = json.loads(df .to_json(orient="records"))
#df_flat = pd.json_normalize(json_struct)
Also tried this How to read and normalize following json in pandas?, but had an issue on what columns to use assign
on.
sample data before normalizing aka balanace_sheet_data_qt
{'balanceSheetHistoryQuarterly': {'AAPL': [{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 'commonStock': 51744000000, 'otherCurrentAssets': 13687000000, 'retainedEarnings': 14301000000, 'otherLiab': 56042000000, 'treasuryStock': 179000000, 'otherAssets': 43270000000, 'cash': 36010000000, 'totalCurrentLiabilities': 132507000000, 'shortLongTermDebt': 7762000000, 'otherStockholderEquity': 179000000, 'propertyPlantEquipment': 37933000000, 'totalCurrentAssets': 154106000000, 'longTermInvestments': 118745000000, 'netTangibleAssets': 66224000000, 'shortTermInvestments': 40816000000, 'netReceivables': 58620000000, 'longTermDebt': 99281000000, 'inventory': 4973000000, 'accountsPayable': 63846000000}}, {'2020-09-26': {'totalLiab': 258549000000, 'totalStockholderEquity': 65339000000, 'otherCurrentLiab': 47867000000, 'totalAssets': 323888000000, 'commonStock': 50779000000, 'otherCurrentAssets': 11264000000, 'retainedEarnings': 14966000000, 'otherLiab': 46108000000, 'treasuryStock': -406000000, 'otherAssets': 33952000000, 'cash': 38016000000, 'totalCurrentLiabilities': 105392000000, 'shortLongTermDebt': 8773000000, 'otherStockholderEquity': -406000000, 'propertyPlantEquipment': 45336000000, 'totalCurrentAssets': 143713000000, 'longTermInvestments': 100887000000, 'netTangibleAssets': 65339000000, 'shortTermInvestments': 52927000000, 'netReceivables': 37445000000, 'longTermDebt': 98667000000, 'inventory': 4061000000, 'accountsPayable': 42296000000}}, {'2020-06-27': {'totalLiab': 245062000000, 'totalStockholderEquity': 72282000000, 'otherCurrentLiab': 39945000000, 'totalAssets': 317344000000, 'commonStock': 48696000000, 'otherCurrentAssets': 10987000000, 'retainedEarnings': 24136000000, 'otherLiab': 47606000000, 'treasuryStock': -550000000, 'otherAssets': 32836000000, 'cash': 33383000000, 'totalCurrentLiabilities': 95318000000, 'shortLongTermDebt': 7509000000, 'otherStockholderEquity': -550000000, 'propertyPlantEquipment': 43851000000, 'totalCurrentAssets': 140065000000, 'longTermInvestments': 100592000000, 'netTangibleAssets': 72282000000, 'shortTermInvestments': 59642000000, 'netReceivables': 32075000000, 'longTermDebt': 94048000000, 'inventory': 3978000000, 'accountsPayable': 35325000000}}, {'2020-03-28': {'totalLiab': 241975000000, 'totalStockholderEquity': 78425000000, 'otherCurrentLiab': 42048000000, 'totalAssets': 320400000000, 'commonStock': 48032000000, 'otherCurrentAssets': 15691000000, 'retainedEarnings': 33182000000, 'otherLiab': 48745000000, 'treasuryStock': -2789000000, 'otherAssets': 33868000000, 'cash': 40174000000, 'totalCurrentLiabilities': 96094000000, 'shortLongTermDebt': 10392000000, 'otherStockholderEquity': -2789000000, 'propertyPlantEquipment': 43986000000, 'totalCurrentAssets': 143753000000, 'longTermInvestments': 98793000000, 'netTangibleAssets': 78425000000, 'shortTermInvestments': 53877000000, 'netReceivables': 30677000000, 'longTermDebt': 89086000000, 'inventory': 3334000000, 'accountsPayable': 32421000000}}]}}
sample data after normalizaing.
,balanceSheetHistoryQuarterly.AAPL
0,"[{'2020-12-26': {'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000,
list of columns I want:
'totalLiab'
'totalStockholderEquity'
'otherCurrentLiab'
'totalAssets'
'commonStock'
'otherCurrentAssets'
'retainedEarnings'
'otherLiab'
'treasuryStock'
'otherAssets'
'cash'
'totalCurrentLiabilities'
'shortLongTermDebt'
'otherStockholderEquity'
'propertyPlantEquipment'
'totalCurrentAssets'
'propertyPlantEquipment'
'totalCurrentAssets'
'longTermInvestments'
'netTangibleAssets'
'shortTermInvestments'
'netReceivables'
'longTermDebt'
'inventory'
'accountsPayable'
I am trying to put this into dataframe/tabular format. I think the first row balanceSheetHistoryQuarterly.AAPL
or date column may be throwing it off.
Any help is appreciated.