Pandas merge for multiheader dataframe

Question

For one daily analysis script, I want to merge multi header data frame with another multi header data frame recursively. So basically my every data frame will have one column in common & rest columns will get added by date.

code:

import pandas as pd

L = [('Category','Fruits','colors'),(d,'A','C')]
cols = [(new, c) for new, start, end in L for c in df1.loc[:, start:end].columns]
df1.columns = pd.MultiIndex.from_tuples(cols)
df3 = pd.DataFrame(df1.values.tolist(), columns= pd.MultiIndex.from_tuples(cols))
df1


M = [('Category','Fruits','colors'),('20200605','A','C')]
cols = [(new, c) for new, start, end in M for c in df2.loc[:, start:end].columns]
df2.columns = pd.MultiIndex.from_tuples(cols)
df4 = pd.DataFrame(df2.values.tolist(), columns= pd.MultiIndex.from_tuples(cols))

####Dataframe merge
df5=pd.merge(df3,df4,on=['Category','Fruits','colors'],how='outer')

(d is date, which is changing daywise) Gives me error as below:

    ValueError: The column label 'Category' is not unique.
    For a multi-index, the label must be a tuple with elements corresponding to each level.

Likewise, another data frame need to be merge further.

Is data frame merger not possible with a multi header, should I need to remove the stack level? I tried pd.concat too but it creates duplicate entries of columns as well as rows. I want 'category' column to be unique and add values only in case of new entry.

is there any other way to achieve this?

You don't need to remove the stack level; you can use a multi-index (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html), and then you only need a 'join'. — Itamar Mushkin, Jun 14 '20 at 13:34
Final note - don't put images of code or data in your question. Post the data, or even better, the code that is used to create the example. This question should help you learn how. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Good luck! — Itamar Mushkin, Jun 14 '20 at 13:36
Hi, thank you for answer. i can not drop columns as I wanted to add new categories as well if found in new data frame . basically original data frame is getting appeneded with everyday data.so any new entry in df should be covered in original — DD08, Jun 14 '20 at 13:41
Can you post a sample data? In the mean time can you try `df3=pd.merge(df1,df2,on=['Category', 'Fruits'],how='outer']` — XXavier, Jun 14 '20 at 13:45
@DD08 you can do it this time by editing your question. It will increase the chance of your question getting answered. — Itamar Mushkin, Jun 14 '20 at 13:52
@XXavier. Thank you I tried suggested option too, still facing issue — DD08, Jun 14 '20 at 15:51

score 1 · Answer 1 · answered Jun 14 '20 at 18:13

1

Looks like you have multi index columns. You can drop a level of your column index and merge the dataframes likes this

df3.columns = df3.columns.droplevel()
df4.columns = df4.columns.droplevel()

df3

Fruits  colors  A   B   C
0   Orange  Orange  a   4   7
1   Banana  yellow  b   5   8
2   Apple   red c   4   9
3   Grapes  green   d   5   4

df3.merge(df4, on=['Fruits', 'colors'])

    Fruits  colors  A_x B_x C_x A_y B_y C_y
0   Orange  Orange  a   4   7   a   4   7
1   Banana  yellow  b   5   8   b   5   8
2   Apple   red c   4   9   c   4   9
3   Grapes  green   d   5   4   d   5   4

answered Jun 14 '20 at 18:13

XXavier

1,206
1
10
14

Thank you. Yes, I can , for output again, I have to make it as multilevel after merging. as column names are based on date(dynamic). it will be great if you can guide me further. – DD08 Jun 14 '20 at 19:20
We can create a multiindex column name and assign it back but do you really need it? When you run the above merge every time it will add the newly added dates into the dataframe as new columns which i think is what you need right? – XXavier Jun 14 '20 at 19:58

Pandas merge for multiheader dataframe

1 Answers1