For one daily analysis script, I want to merge multi header data frame with another multi header data frame recursively. So basically my every data frame will have one column in common & rest columns will get added by date.
code:
import pandas as pd
L = [('Category','Fruits','colors'),(d,'A','C')]
cols = [(new, c) for new, start, end in L for c in df1.loc[:, start:end].columns]
df1.columns = pd.MultiIndex.from_tuples(cols)
df3 = pd.DataFrame(df1.values.tolist(), columns= pd.MultiIndex.from_tuples(cols))
df1
M = [('Category','Fruits','colors'),('20200605','A','C')]
cols = [(new, c) for new, start, end in M for c in df2.loc[:, start:end].columns]
df2.columns = pd.MultiIndex.from_tuples(cols)
df4 = pd.DataFrame(df2.values.tolist(), columns= pd.MultiIndex.from_tuples(cols))
####Dataframe merge
df5=pd.merge(df3,df4,on=['Category','Fruits','colors'],how='outer')
(d is date, which is changing daywise) Gives me error as below:
ValueError: The column label 'Category' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
Likewise, another data frame need to be merge further.
Is data frame merger not possible with a multi header, should I need to remove the stack level? I tried pd.concat
too but it creates duplicate entries of columns as well as rows. I want 'category' column to be unique and add values only in case of new entry.
is there any other way to achieve this?