I have three problems I am trying to solve: I have an excel file with multiple sheets. Iam trying to take those multiple sheet that have different column names but all have an ID(named differently in each sheet) and combine them into one large data frame to do analyses. ***** my main issue: If the different sheets have the same column name I want to combine the column rows into a list and set(list) to remove the duplicates. * this the problem I am looking to solve but with the bigger picture there might a better function or way to do this whole problem.
I found this on here but it doesn't state how to combine multiple columns with the same column name into rows joined on a comma or a list. Python: pandas merge multiple dataframes
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames)
# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['DATE'],
how='outer'), data_frames).fillna('void')
After combining the multiple dataframes if the column names were the same among different dataframes merge the columns'rows into a list. then set(over each row) to remove the duplicates in that row. Iam looking for an output like this
date car outlet code
ID_number
66 2012/05/12 [Mercedes Benz] [AA ] [2061]
156 2012/10/24 [Mercedes Benz,Honda] [CC,DD] [2031,2401]
133 2012/06/11 [Volvo] [BB] [2032]
142 2012/02/09 [BMW,Lexus] [CC,EE] [2129,3045]
63 2012/03/16 [Mercedes Benz] [AA] [2161]
156 2012/11/29 [Volvo] [CC] [2171]
184 2012/08/25 [Mercedes Benz] [BB] [2122]
175 2012/05/30 [Volvo] [CC] [2089]
181 2012/04/12 [BMW] [AA] [2080]
122 2012/10/28 Volvo AA 2189