1

I have three problems I am trying to solve: I have an excel file with multiple sheets. Iam trying to take those multiple sheet that have different column names but all have an ID(named differently in each sheet) and combine them into one large data frame to do analyses. ***** my main issue: If the different sheets have the same column name I want to combine the column rows into a list and set(list) to remove the duplicates. * this the problem I am looking to solve but with the bigger picture there might a better function or way to do this whole problem.

I found this on here but it doesn't state how to combine multiple columns with the same column name into rows joined on a comma or a list. Python: pandas merge multiple dataframes

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), data_frames)

# if you want to fill the values that don't exist in the lines of merged dataframe simply fill with required strings as

df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['DATE'],
                                            how='outer'), data_frames).fillna('void')

After combining the multiple dataframes if the column names were the same among different dataframes merge the columns'rows into a list. then set(over each row) to remove the duplicates in that row. Iam looking for an output like this

    date    car     outlet  code
ID_number               
66      2012/05/12  [Mercedes Benz]         [AA ]   [2061]
156     2012/10/24  [Mercedes Benz,Honda]   [CC,DD] [2031,2401]
133     2012/06/11  [Volvo]                 [BB]    [2032]
142     2012/02/09  [BMW,Lexus]             [CC,EE] [2129,3045]
63      2012/03/16  [Mercedes Benz]         [AA]    [2161]
156     2012/11/29  [Volvo]                 [CC]    [2171]
184     2012/08/25  [Mercedes Benz]         [BB]    [2122]
175     2012/05/30  [Volvo]                 [CC]    [2089]
181     2012/04/12  [BMW]                   [AA]    [2080]
122     2012/10/28  Volvo   AA  2189
devlops_s
  • 45
  • 2
  • 14
  • @jezrael hi I was wondering if you could help out on this problem. It is similar to this one but placing rows into a list and using a set() to get rid of the duplicates.https://stackoverflow.com/questions/44327999/python-pandas-merge-multiple-dataframes – devlops_s Mar 25 '21 at 16:11
  • It would be really helpful if you could post some sample data for each `df` and expected output. – Mayank Porwal Mar 25 '21 at 16:11

0 Answers0