I have 2 CSV files that were created at different dates that I want to compare and show what remained the same and what has changed. I don’t know where or how to begin, because when I try different merges and joins I run into the issue of the dataframes not being the same size.
df1 :
I ID Status
0 123 Active
1 124 Active
2 125 Inactive
3 126 Active
4 128 Inactive
df2:
I ID Status
0 123 Active
1 124 Inactive
2 125 Inactive
3 126 Active
4 128 Active
5 129 Active
6 130 Active
7 131 Active
8 132 Inactive
The goal is to highlight the status changes from df1 to df2 and remained constant from df1 to df2. Using the example above maybe I create 2 separate Dataframes that look something like this:
df3: (containing all new changes)
I ID Status
1 124 Inactive
4 128 Active
5 129 Active
6 130 Active
7 131 Active
df4: (containing all other ‘Active’ one that remained consistent)
I ID Status
0 123 Active
3 126 Active
To explain the logic behind each row and why it is included in df3, I will go row by row, because I don’t know if my example is clear enough
df3:
Index 1 - active to inactive
Index 4 - inactive to active
Index 5 - new active row
Index 6 - new active row
Index 7 - new active row
Index 8 - new inactive row
df4:
Index 0 - remained constant
Index 2 - remained constant
Index 3 - remained constant
I don’t know how to approach this because with merge and join I come across the error that the dataframes need to be the same size. Basically, what I want to do find what changed and what stayed the same from df1 to df2. I have 2 sample datasets that I am working with, they have more statuses but the idea is the same. Here is a google sheet with both csv files, the updated_values would be df2 and the original_values would be df1.