-1

I have a dataframe and would like to get the 'item_id' so the 'item_price' does not vary:

         date  date_block_num  shop_id  item_id  item_price  item_cnt_day
0  02.01.2013               0       59    22154      999.00           1.0
1  03.01.2013               0       25     2552      899.00           1.0
2  05.01.2013               0       25     2552      899.00          -1.0
3  06.01.2013               0       25     2555     1709.05           1.0
4  15.01.2013               0       25     2555     1099.00           1.0

For example here you should get 22154, 2552.

So I tried:

d = {}
for row in transactions.iterrows():
    try:
        # Let's make sure that the prices of item_id have not changed
        d[row.item_id]['item_price'] != row.item_price:
            d.pop(row.item_id, None)
    # in the other case is that the item_id is no longer a dictionary key
    except KeyError:
        d[row.item_id]['item_price'] = row.item_price

But I get:

  File "<ipython-input-22-06e70f158952>", line 5
    d[row["item_id"]]['item_price'] != row["item_price"]:
                                                         ^
SyntaxError: invalid syntax
cs95
  • 379,657
  • 97
  • 704
  • 746
Revolucion for Monica
  • 2,848
  • 8
  • 39
  • 78

1 Answers1

1

There's no need to iterate; let's use groupby and nunique here:

items = df.groupby('item_id')['item_price'].nunique()
items[items == 1].index.tolist()
# [2552, 22154]

Note: item_price should preferrably be whole numbers to avoid incorrect results from floating point inaccuracies.


Here's a similar alternative:

m = df.groupby('item_id')['item_price'].transform('nunique').eq(1)
df.loc[m, 'item_id'].unique()
# array([22154,  2552])
cs95
  • 379,657
  • 97
  • 704
  • 746