I have two data frames with identical columns. I would like to generate a new df where the data is not the same between the columns in the data frames.
Like this:
Note, I have done some pre-processing such that:
- All ids in df1 exist in df2; all ids in df2 exist in df1
- There are no NA/NAN values.
- All cells are strings with a minimum length of 1 and a max length of 3500.
- The names of the columns that need to be compared are stored in a list.
I'm not sure how to get this granular information, I have tried iterating over each column and generating a dataframe. Like this:
for v in col_list:
m_df = pd.merge(df1, df2, on = ['id',v], how = 'outer', indicator=True]).('_merge != "both"')
But, I'm not sure how to combine these data frames into a single data frame.
This solution closely address my problem but I don't know how to transform it for my needs: https://stackoverflow.com/a/47112033/7987118