Python: How to turn a dictionary of Dataframes into one big dataframe with column names being the key of the previous dict?

Question

So my dataframe is made from lots of individual excel files, each with the the date as their file name and the prices of the fruits on that day in the spreadsheet, so the spreadsheets look something like this:

15012016:
Fruit     Price
Orange    1
Apple     2
Pear      3

16012016:
Fruit     Price
Orange    4
Apple     5
Pear      6

17012016:
Fruit     Price
Orange    7
Apple     8
Pear      9

So to put all that information together I run the following code to put all the information into a dictionary of dataframes (all fruit price files stored in 'C:\Fruit_Prices_by_Day'

#find all the file names
file_list = []
for x in os.listdir('C:\Fruit_Prices_by_Day'):
    file_list.append(x) 

file_list= list(set(file_list))

d = {}

for date in Raw_list:
    df1 = pd.read_excel(os.path.join('C:\Fruit_Prices_by_Day', date +'.xlsx'), index_col = 'Fruit')
    d[date] = df1

Then this is the part where I'm stuck. How do I then make this dict into a dataframe where the column names are the dict keys i.e. the dates, so I can get the price of each fruit per day all in the same dataframe like:

          15012016   16012016   17012016   
Orange    1          4          7
Apple     2          5          8
Pear      3          6          9

score 25 · Answer 1 · edited Oct 09 '19 at 05:35

You can try first set_index of all dataframes in comprehension and then use concat with remove last level of multiindex in columns:

 print d
{'17012016':     Fruit  Price
0  Orange      7
1   Apple      8
2    Pear      9, '16012016':     Fruit  Price
0  Orange      4
1   Apple      5
2    Pear      6, '15012016':     Fruit  Price
0  Orange      1
1   Apple      2
2    Pear      3}
d = { k: v.set_index('Fruit') for k, v in d.items()}

df = pd.concat(d, axis=1)
df.columns = df.columns.droplevel(-1) 
print df
        15012016  16012016  17012016
Fruit                               
Orange         1         4         7
Apple          2         5         8
Pear           3         6         9

If you happen to encounter a trouble in `concat`, check for duplicate index values --> see https://stackoverflow.com/questions/27719407/pandas-concat-valueerror-shape-of-passed-values-is-blah-indices-imply-blah2 — ztl, Nov 25 '20 at 09:39

score 8 · Answer 2 · answered Jun 24 '20 at 11:40

Solution:

pd.concat(d, axis=1).sum(axis=1, level=0)

Explanation:

After .concat(d, axis=1) you will get

        15012016  16012016  17012016
        Price     Price     Price
Fruit                               
Orange       1         4         7
Apple        2         5         8
Pear         3         6         9

And adding .sum(axis=1, level=0) transforms it to

        15012016  16012016  17012016
Fruit                               
Orange       1         4         7
Apple        2         5         8
Pear         3         6         9

Throws `FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum(). ` — jtlz2, Jul 29 '22 at 07:40

score 6 · Answer 3 · answered May 20 '19 at 13:40

6

Something like this could work: loop over the dictionary, add the constant column with the dictionary key, concatenate and then set the date as index

pd.concat(
    (i_value_df.assign(date=i_key) for i_key, i_value_df in d.items())
).set_index('date')

answered May 20 '19 at 13:40

Dr Fabio Gori

1,105
16
21

Python: How to turn a dictionary of Dataframes into one big dataframe with column names being the key of the previous dict?

3 Answers3

Linked