-1

My first question in stackoverflow!

I have a triple nested dictionary and I want to convert it to pandas df. The dictionary has the following structure:

dictionary = {'CompanyA': {'Revenue': {date1 : $1}, {date2: $2}},... 
                          {'ProfitLoss': {date1 : $0}, {date2: $1}}},
              'CompanyB': {'Revenue': {date1 : $1}, {date2: $2}},... 
                          {'ProfitLoss': {date1 : $0}, {date2: $1}}},
              'CompanyC': {'Revenue': {date1 : $1}, {date2: $2}},...
                          {'ProfitLoss': {date1 : $0}, {date2: $1}}}}

So far I been able to construct a df using:

df = pd.DataFrame.from_dict(dictionary)

But the results its a df with values as dictionaries like this:

          CompanyA          CompanyB          CompanyC
Revenue   {date1:$0,..}     {date1:$1,..}     {date1:$0,..} 
ProfitLoss{date1:$0,..}     {date1:$0,..}     {date1:$0,..}

I want the table to look like this:

                    CompanyA    CompanyB    CompanyC  
Revenue    Date1       $1          $1          $1 
           Date2       $2          $2          $2

ProfitLoss Date1       $0          $0          $0
           Date2       $1          $1          $1

I had tried using pd.MultiIndex.from_dict (.from_product) and change the index, with no result. Any idea what to do next? Any hint will be appreciated!

Edmond Dantes
  • 119
  • 1
  • 1
  • 4
  • Possible duplicate of [Construct pandas DataFrame from items in nested dictionary](https://stackoverflow.com/questions/13575090/construct-pandas-dataframe-from-items-in-nested-dictionary) – Edgar Ramírez Mondragón Oct 08 '18 at 06:28

1 Answers1

0

I see you're new, but there may be an answer to a similar question, see this. Next time try looking for a similar question using keywords. For example, I found the one I linked by searching "pandas nested dict", and that's it, the first link was the SO post!

Anyway, you need to reshape your input dict. You want a dict structured like this:

{
    'CompanyA': {
        ('Revenue', 'date1'): 1,
        ('ProfitLoss', 'date1'): 0,
    }
    ...
}

I would do something like this:

import pandas as pd

data = {
    'CompanyA': {
        'Revenue': {
            "date1": 1,
            "date2": 2
        },
        'ProfitLoss': {
            "date1": 0,
            "date2": 1
        }
    },
    'CompanyB': {
        'Revenue': {
            "date1": 4,
            "date2": 5
        },
        'ProfitLoss': {
            "date1": 2,
            "date2": 3
        }
    }
}

# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
                                    for j in data[i] for k in data[i][j]}
                                for i in data}, orient="columns")

print(df)

Output:

                  CompanyA  CompanyB
ProfitLoss date1         0         2
           date2         1         3
Revenue    date1         1         4
           date2         2         5

EDIT

Using actual datetimes to respond to your comment:

import pandas as pd
import datetime as dt

date1 = dt.datetime.now()
date2 = date1 + dt.timedelta(days=365)

data = {
    'CompanyA': {
        'Revenue': {
            date1: 1,
            date2: 2
        },
        'ProfitLoss': {
            date1: 0,
            date2: 1
        }
    },
    'CompanyB': {
        'Revenue': {
            date1: 4,
            date2: 5
        },
        'ProfitLoss': {
            date1: 2,
            date2: 3
        }
    }
}

# Reshape your data and pass it to `DataFrame.from_dict`
df = pd.DataFrame.from_dict({i: {(j, k): data[i][j][k]
                                    for j in data[i] for k in data[i][j]}
                                for i in data}, orient="columns")

print(df)

Output:

                                       CompanyA  CompanyB
ProfitLoss 2018-10-08 11:19:09.006375         0         2
           2019-10-08 11:19:09.006375         1         3
Revenue    2018-10-08 11:19:09.006375         1         4
           2019-10-08 11:19:09.006375         2         5
  • Edgar, thank you for taking your time answering my question. Following your example, I got to the same result as I got before using Pandas MultiIndex. In both case I got the right table format but no data just NaN inside the table. Could the date format ( datetime.date(2018, 3, 31): 1.0 ) had something to do with populating the table wrong? – Edmond Dantes Oct 08 '18 at 14:57
  • @edmond-dantes, check the structure of your dictionary. I think `dictionary = {'CompanyA': {'Revenue': {date1 : $1}, {date2: $2}},... ` should be `dictionary = {'CompanyA': {'Revenue': {date1 : $1, date2: $2}},... `. – Edgar Ramírez Mondragón Oct 08 '18 at 16:22
  • the structure of the original dictionary it's Ok, I misplaced a { }, sorry about that! Continuing with this, it's looks like that there's a bug according to this post at GitHub. It's exactly my problem with the dates: [https://github.com/pandas-dev/pandas/issues/19993] – Edmond Dantes Oct 09 '18 at 01:23
  • I will consider my question answered! – Edmond Dantes Oct 09 '18 at 03:35