Filters the rows of a dataframe if the values of some columns have changed

Question

I have a dataframe and would like to get the 'item_id' so the 'item_price' does not vary:

         date  date_block_num  shop_id  item_id  item_price  item_cnt_day
0  02.01.2013               0       59    22154      999.00           1.0
1  03.01.2013               0       25     2552      899.00           1.0
2  05.01.2013               0       25     2552      899.00          -1.0
3  06.01.2013               0       25     2555     1709.05           1.0
4  15.01.2013               0       25     2555     1099.00           1.0

For example here you should get 22154, 2552.

So I tried:

d = {}
for row in transactions.iterrows():
    try:
        # Let's make sure that the prices of item_id have not changed
        d[row.item_id]['item_price'] != row.item_price:
            d.pop(row.item_id, None)
    # in the other case is that the item_id is no longer a dictionary key
    except KeyError:
        d[row.item_id]['item_price'] = row.item_price

But I get:

  File "<ipython-input-22-06e70f158952>", line 5
    d[row["item_id"]]['item_price'] != row["item_price"]:
                                                         ^
SyntaxError: invalid syntax

cs95 · Accepted Answer · 2020-10-11T09:24:40.187

There's no need to iterate; let's use groupby and nunique here:

items = df.groupby('item_id')['item_price'].nunique()
items[items == 1].index.tolist()
# [2552, 22154]

Note: item_price should preferrably be whole numbers to avoid incorrect results from floating point inaccuracies.

Here's a similar alternative:

m = df.groupby('item_id')['item_price'].transform('nunique').eq(1)
df.loc[m, 'item_id'].unique()
# array([22154,  2552])

Filters the rows of a dataframe if the values of some columns have changed

1 Answers1