Merging multiple dataframe into one with each dataframe as a header name containing many columns in it and creating a 3D dataframe

Question

I have multiple dataframes df1, df2 ,df3 etc to df10. The dataframe has 135 columns. each look like this:

time	a	b	c	d	e	f	g
1	2	3	4	5	6	7	8

I wanted to arrange them in one data frame and stack them together side by side but having their df name as the header. Meaning one heading df1 having all those columns names( time,a,b...) and their value under it and so on.Seeing this example here Constructing 3D Pandas DataFrame I tried following codes

   list1=['df1', 'df2', 'df3', 'df4', 'df5','df6', 'df7', 'df8', 'df9', 
   'df10']
   list2=[]
   for df in list1:
    for i in range(135):
        list2.append(df)
   A=np.array(list2)
   B = np.array([df1.columns]*10)
   C=pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10], axis=1)
   C=C.values.tolist()
   C=np.array(C)
   df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
   print(df)

But each time I am having an error TypeError: unhashable type: 'numpy.ndarray' I have a column time: where the time are in hhmm format. 01:00,01:01 so on. I tried dropping the column from the data frames but getting same error. How could I fix this? Can anyone help?

n1colas.m · Accepted Answer · 2021-07-04T19:52:23.320

You could use the keys in Pandas concat command (using the correct range with f-string to create a relevant nomenclature or use your already defined list1):

keys sequence, default None

If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.

import pandas as pd
import numpy as np

# setup
np.random.seed(12345)
all_df_list = []
for i in range(3):
    d = {
        'time': (pd.timedelta_range(start='00:01:00', periods=5, freq='1s')
                    + pd.Timestamp("00:00:00")).strftime("%M:%S"),
        'a': np.random.rand(5),
        'b': np.random.rand(5),
        'c': np.random.rand(5),
    }
    all_df_list.append(pd.DataFrame(d).round(2))

# code
dfc = pd.concat(all_df_list, axis=1,
        keys=[f'df{i}' for i in range(1,4)]) # use the correct 'range' or your already defined 'list1'

dfc = dfc.set_index(dfc.df1.time)
dfc = dfc.drop('time', axis=1, level=1)
print(dfc)

        df1               df2               df3
          a     b     c     a     b     c     a     b     c
time
01:00  0.93  0.60  0.75  0.66  0.64  0.73  0.03  0.53  0.82
01:01  0.32  0.96  0.96  0.81  0.72  0.99  0.80  0.60  0.50
01:02  0.18  0.65  0.01  0.87  0.47  0.68  0.90  0.05  0.81
01:03  0.20  0.75  0.11  0.96  0.33  0.79  0.02  0.90  0.10
01:04  0.57  0.65  0.30  0.72  0.44  0.17  0.49  0.73  0.22

Extracting columns a and b from df2

In [190]: dfc.df2[['a','b']]
Out[190]:
          a     b
time
01:00  0.66  0.64
01:01  0.81  0.72
01:02  0.87  0.47
01:03  0.96  0.33
01:04  0.72  0.44

How can I set all those 'time' column from all df and set one common time column as index of this full df? Also, how can I subset specific column from specific dataframe.Lets say, i want to extract column a & b fir df2. How can I do that? — Jewel_R, Jul 04 '21 at 18:17

Merging multiple dataframe into one with each dataframe as a header name containing many columns in it and creating a 3D dataframe

1 Answers1