I have a list of dataframes I want to merge together with the following requirements:
- If the dataframes have the same columns (i.e. if it's a data refresh of some sort), then they should be "concatenated" (no creating of suffixes such as _x or _y)
- The merging should be done on a list of indices so that no duplicate rows will be found
Example:
df1 = pd.DataFrame{'WeekCom':['2020-01-01','2020-01-02','2020-01-02','2020-01-03'],'Y':[2020,2020,2020,2020],'QT':[Q1,Q1,Q1,Q1],'M':['Jan','Jan','Jan','Jan'],'W':['W1','W1','W1','W2'],'Col_X':[0,1,1,2],'Col_Y':[3,0,0,1]}
df2 = pd.DataFrame{'WeekCom':['2020-01-02','2020-01-04'],'Y':[2020,2020],'QT':[Q1,Q1],'M':['Jan','Jan'],'W':['W1','W2'],'Col_X':[1,3],'Col_Z':[3,3]}
dataframe_list = [df1,df2]
current_file_type_merged_dataframe = reduce(lambda merged, df: pd.merge(merged, df, on=list_of_indices, how='outer'), dataframe_list)
This kinda works on the list of indices ('WeekCom','Y','H','QT',...) since I get a unique set of indices on the dataframe, but then the columns that have the same name from both dataframes, are "separated" by the suffix "_x" and "_y". I've already checked this SO thread Pandas Merging 101, but no sucess.
Thanks!