1

I have the following dataframes:

DF1:

     default_item_header   item_ser_num   from_date     to_date  tp_price
0                      2             10  2004-04-01  2004-04-16  15907.89
1                      2             20  2004-04-17  2004-05-02  15908.11
2                      2             30  2004-05-03  2004-05-18  15908.23
3                      2             40  2004-05-19  2004-06-03  15908.32
4                      2             50  2004-06-04  2004-06-19  15908.41
5                      2             60  2004-06-20  2004-07-05  15908.56
6                      2             70  2004-06-20  2004-07-05  15908.56
7                      2             80  2004-07-06  2004-07-21  15908.67

DF2:

     default_item_header   item_ser_num   from_date     to_date   tp_price
0                      2             80  2004-07-06  2004-07-21   15908.67
1                      2             90  2004-07-22  2004-08-06   15908.88

I want to isolate the row number 80 from DF2 which is also appearing in DF1.

I have tried pandas .compare method using:

isolate_data = df1.compare(df2, keep_equal=True)

but coming up with error:

Can only compare identically-labeled DataFrame objects.

I think I am missing something obvious (only I can't spot it). Any help?

carla
  • 141
  • 9
  • Does this answer your question? [Pandas "Can only compare identically-labeled DataFrame objects" error](https://stackoverflow.com/questions/18548370/pandas-can-only-compare-identically-labeled-dataframe-objects-error) – aaossa Nov 03 '22 at 15:00
  • Checkout the second answer: seems like the index matters, so try dropping it – aaossa Nov 03 '22 at 15:01
  • Tried using `df1.reset_index(drop=True` but indices are still appearing in print. And still coming up with the same error **Can only compare identically-labeled DataFrame objects.** – carla Nov 03 '22 at 15:24
  • @aaossa: As discussed [here](https://stackoverflow.com/a/57812527/), I tried using `isolate_data = pd.concat([df1,df2]).drop_duplicates(keep=False)`. But the resultant dataframe **isolate_data** shows all the rows from df1 and df2 combined. And the funnier part is that the values for field _tp_price_ in df2 are **rounded up** (to, for example, **15908.7** , without a _succeeding zero (0)_ whereas for df1 it retains the original value of **15908.67**) ?! **Why?** – carla Nov 04 '22 at 03:28

1 Answers1

1

So I am now extracting the row/s that are not present in the compared DF (df1) using:

    isolate_data = df2[~(df2['default_item_header'].isin(df1['default_item_header']) & df2['tp_price'].isin(df1['tp_price']))].reset_index(drop=True)

This way I have been able to extract only the row from df2 not appearing in df1.

Ref. this answer that helped me achieve what I am trying to do.

carla
  • 141
  • 9