0

I want to join data frames using pandas.

I have several data frames. For example, two of them are:

df1=

           A      B      C
0    1    1.1    1.2    1.3
1    2    1.4    1.5    1.4

and df2=

           A      B      C
0    1    2.1    2.2    2.3
1    2    2.4    2.5    2.4

I would like to join them into 1 data frame like:

                  A      B      C
R1     0    1    1.1    1.2    1.3
       1    2    1.4    1.5    1.4
                  A      B      C
R2     0    1    2.1    2.2    2.3
       1    2    2.4    2.5    2.4

I tried to put them into a dictionary and convert them to the DataFrame like:

dict_val = {}
dict_val['R1'] = df1
dict_val['R2'] = df2
df_join = pd.DataFrame.from_dict(dict_val)

however, I am getting the following error: ValueError: Must pass 2-d input.

How can I make this type of data frame?

Tutu
  • 1
  • 3
  • 2
    Try `df_join = pd.concat(dict)` (NB. Don't use `dict` as variable name, this is a python built-in) – mozway Mar 15 '23 at 16:55
  • Does this answer your question? [Concatenate a list of pandas dataframes together](https://stackoverflow.com/questions/32444138/concatenate-a-list-of-pandas-dataframes-together) – Michael Cao Mar 15 '23 at 16:55
  • I am not using dict as a variable I forgot to change it here (I will modify it now). The variables aren't causing an issue – Tutu Mar 15 '23 at 16:56
  • fyi -- you should read the docs of `pandas.DataFrame.from_dict()`. It doesn't expect a dictionary of already created data frames. In the "Getting Started" section of documentation, there's a header labeled "How do I combine data from multiple tables?". You should read that too. – Paul H Mar 15 '23 at 16:58
  • @MichaelCao, I need to add an additional column to each data frame and then specify the indices. I wondered if there is any way to convert a dictionary with pandas.DataFrame as its values to another pandas.DataFrame? – Tutu Mar 15 '23 at 17:01
  • @PaulH, thanks for the information. But they do not sure this example. I need to specify 'R1' and 'R2' (and so on) for each data frame I am joining – Tutu Mar 15 '23 at 17:06
  • This example *is* demonstrated. Read more closely the examples about air quality that use the `key` argument – Paul H Mar 15 '23 at 17:09
  • Have you tested `concat` as I suggested? – mozway Mar 15 '23 at 17:09

1 Answers1

0

Try df_join = pd.concat(dict) (NB. Don't use dict as variable name, this is a python built-in)

- comment by mozway

concat is the simplest [and probably the most efficient] way to join multiple DataFrames together.

I need to add an additional column to each data frame and then specify the indices - [about R1 and R2]

You can use the keys argument to do so

df_join = pd.concat([df1,df2], keys=['R1','R2'])

# for a list of DataFrames of unknown length: 
# df_join = pd.concat(dfs_list, keys=[f'R{i}' for i in range(1, len(dfs_list)+1)])

And if you really want to combine them as dictionaries first for some reason, you can do it on loop

dfs_list = [df1, df2]
index_join, data_join = [], []
for di, df in enumerate(dfs_list):
    data_join += df.to_dict('records')

    index_join += [(f'R{di}', *i) for i in df.index] # if df is multi-indexed
    # index_join += list(df.index) # if df is not multi-indexed

df_join = pd.DataFrame(data=data_join, index=pd.MultiIndex.from_tuples(index_join))   

or with dictionary comprehension

df_join = pd.DataFrame.from_dict({
    (f'R{di}', *df_index): df_row for di,df in enumerate([df1,df2]) 
    for df_index, df_row in df.to_dict('index').items()
}, orient="index")

which effectively does the same as if you edited your code to

dict_val = {}
dict_val.update({('R1',i1,i2):row for (i1,i2), row in df1.to_dict('index').items()})
dict_val.update({('R2',i1,i2):row for (i1,i2), row in df2.to_dict('index').items()})
df_join = pd.DataFrame.from_dict(dict_val, orient="index")
Driftr95
  • 4,572
  • 2
  • 9
  • 21