I have two data frames from a data export today and yesterday and want to find out if was any update on the data. The export varies and I cannot just perform a simple compare like with
df3 = pd.concat([df1,df2]).drop_duplicates(keep=False)
Because it will give me the unique lines from both df.
I need to find out if DF1 contains lines from DF2 that have different values. And I just want the rows from DF1 with the updates. The columns ID is the common identifier for rows in both DF. It is also unique in both df. But DF2 can contain more rows that DF1 and to make things worse the object type may differ as well. So for a proper compare both df should be converted to str only or the object type should be ignored for the compare.
data frame 1
id | name | last login | last order | total spent |
---|---|---|---|---|
1 | Sean | 1/1/2023 | 1/1/2023 | 432 |
2 | John | 1/1/2023 | 1/1/2023 | 98 |
3 | Jim | 1/1/2023 | 1/1/2023 | 123 |
data frame 2
id | name | last login | last order | total spent |
---|---|---|---|---|
1 | Sean | 1/1/2023 | 1/1/2023 | 432 |
2 | John | 1/6/2022 | 1/6/2022 | 2 |
3 | Jim | 1/1/2023 | 1/1/2023 | 123 |
4 | Joe | 7/8/2022 | 7/8/2022 | 1503 |
expected result:
id | name | last login | last order | total spent |
---|---|---|---|---|
2 | John | 1/1/2023 | 1/1/2023 | 98 |
I have tried to different ways to avoid looping thru the rows, but have not found a solution yet. Another hurdle is that pandas seems to keep changing the object type of columns while working with it(int become float, boolean become object), which make the compare even more difficult.
"Anti-merge" in pandas (Python) does not answer my question as it gives back unique lines from both data frames, not just df1. It is just a more complicated way for the method with concat I have in my question.