0

can anyone help me with that JSON format: (updated dataframe)

JSON:

{'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025},  {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025},  {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025},  {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025},  {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033},  {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033},  {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033},  {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033},  {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200,   'formatted_date': '2019-06-03',   'amount': 0.3},  {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2},  {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4},  {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1},  {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}

So I got it from

yahoo_financials.get_daily_dividend_data('2019-1-1', '2020-12-1')

As an example.

tried it to convert to DataFrame by:


    data2 = {"data": {'VIG.VI': [{'date'......................................
    s=pd.DataFrame(data2)
    pd.concat([s.drop('data',1),pd.DataFrame(s.data.tolist(),index=s.index)],1)

In this case I get result like: 0 [{'date': 1433314500, 'formatted_date': '2015-... [{'date': 1430290500, 'formatted_date': '2015-...

Everything is perfect if weuse only 1 date + delete []:

Also I tried the code which under this topic: It works fine if format is the same for every variable in [], however if it is as in example above, then I get a mistake "arrays must all be same length"

Does anyone have any idea how could I convert this type of JSON to DataFrame?

Div_st_mil
  • 11
  • 2

2 Answers2

0

You can convert that list of dict to dict of list. Then convert the final dict to multi index columns dataframe with:

import pandas as pd
from collections import defaultdict

data2 = {"data": {'PSG.MC': [{'date': 1547452800,'formatted_date': '2019-01-14', 'amount': 0.032025},  {'date': 1554361200, 'formatted_date': '2019-04-04', 'amount': 0.032025},  {'date': 1562310000, 'formatted_date': '2019-07-05', 'amount': 0.032025},  {'date': 1570690800, 'formatted_date': '2019-10-10', 'amount': 0.032025},  {'date': 1578902400, 'formatted_date': '2020-01-13', 'amount': 0.033},  {'date': 1588057200, 'formatted_date': '2020-04-28', 'amount': 0.033},  {'date': 1595228400, 'formatted_date': '2020-07-20', 'amount': 0.033},  {'date': 1601362800, 'formatted_date': '2020-09-29', 'amount': 0.033},  {'date': 1603436400, 'formatted_date': '2020-10-23', 'amount': 0.033}], 'ACX.MC': [{'date': 1559545200,   'formatted_date': '2019-06-03',   'amount': 0.3},  {'date': 1562137200, 'formatted_date': '2019-07-03', 'amount': 0.2},  {'date': 1591254000, 'formatted_date': '2020-06-04', 'amount': 0.4},  {'date': 1594018800, 'formatted_date': '2020-07-06', 'amount': 0.1},  {'date': 1606809600, 'formatted_date': '2020-12-01', 'amount': 0.1}]}}

data = {}

for key, values in data2['data'].items():
    res = defaultdict(list)
    {res[k].append(sub[k]) for sub in values for k in sub}
    data[key] = dict(res)

def reform_dict(data):
    reformed_dict = {}

    for outerKey, innerDict in data.items():
        for innerKey, values in innerDict.items():
            reformed_dict[(outerKey, innerKey)] = values

    return reformed_dict

df = pd.concat([pd.DataFrame(reform_dict({key: value})) for key, value in data.items()], axis=1)
print(df)

       PSG.MC                                 ACX.MC                      
         date formatted_date    amount          date formatted_date amount
0  1547452800     2019-01-14  0.032025  1.559545e+09     2019-06-03    0.3
1  1554361200     2019-04-04  0.032025  1.562137e+09     2019-07-03    0.2
2  1562310000     2019-07-05  0.032025  1.591254e+09     2020-06-04    0.4
3  1570690800     2019-10-10  0.032025  1.594019e+09     2020-07-06    0.1
4  1578902400     2020-01-13  0.033000  1.606810e+09     2020-12-01    0.1
5  1588057200     2020-04-28  0.033000           NaN            NaN    NaN
6  1595228400     2020-07-20  0.033000           NaN            NaN    NaN
7  1601362800     2020-09-29  0.033000           NaN            NaN    NaN
8  1603436400     2020-10-23  0.033000           NaN            NaN    NaN
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Thank you for this code. It works nice with dataset when arrays are same length. However if I add some variables and dates, it seems already give a mistake: – Div_st_mil Apr 23 '21 at 10:49
0

Thank you for your code and help.

Here sharing my code, it works nice and output is nice table with needed data, may be it will be helpful for someone:

def getDividends:

def getDividends(tickers, start_date, end_date):
yahoo_financials = YahooFinancials(tickers)
dividends = yahoo_financials.get_daily_dividend_data(start_date, end_date)
return dividends

def Frame:

def getDividendDataFrame(tickerList):
dividendList = getDividends(tickerList, '2015-1-1', '2020-12-1')
dataFrame = pd.DataFrame()

for ticker in dividendList:
    for dividend in dividendList[ticker]:
        series = pd.Series([ticker, dividend['formatted_date'], dividend['amount']])
        dfItem = pd.DataFrame([series])
        dataFrame = pd.concat([dataFrame, dfItem], ignore_index=True)
    print('\n')
dataFrame.columns=['Ticker', 'formatted_date', 'amount']
return dataFrame
Div_st_mil
  • 11
  • 2