I have a Pandas df
which looks like this:
| | yyyy_mm_dd | id | product | status | is_50 | cnt |
|---|------------|----|------------|--------|-------|-----|
| | 2002-12-15 | 7 | prod_rs | 2 | 0 | 8 |
| | 2002-12-15 | 16 | prod_go | 2 | 0 | 1 |
| | 2002-12-15 | 16 | prod_mb | 2 | 0 | 3 |
| | 2002-12-15 | 29 | prod_er | 2 | 0 | 2 |
| | 2002-12-15 | 29 | prod_lm | 2 | 0 | 2 |
| | 2002-12-15 | 29 | prod_ops | 2 | 0 | 2 |
I also have a second dataframe which is similar:
| | id | product | cnt |
|---|----|------------|-----|
| | 7 | prod_rs | 8 |
| | 16 | prod_go | 1 |
| | 16 | prod_mb | 3 |
| | 29 | prod_er | 2 |
| | 29 | prod_lm | 2 |
| | 29 | prod_ops | 6 |
How can I create a third dataframe which will only store the rows which do not have an equal count? Based on the above, only the last row would be returned as the cnt for the id
/ product
combination differs. Example output:
| | id | product | cnt_df1 | cnt_df2 |
|---|----|---------|---------|---------|
| | 29 | prod_ops| 2 | 6 |
The second df
is one row larger in size so not all id
/ product
combinations may be present in both dataframes.
I've been looking at merge but I'm unsure how to use when the cnt
columns are not equal.