1

I have a data frame with the variables STORE, SALES_DT, REGISTER, TRANS_ID, and PRODUCT.

Each unique combination of STORE, SALES_DT, REGISTER, and TRANS_ID represents a unique transaction, not just the TRANS_ID. For example, there could be a transaction with the same store, date, and transaction id, and product but at a different register. Any combination is possible. A very small portion of the data frame could be...

STORE   SALES_DT      REGISTER    TRANS_ID    PRODUCT
1       2017-04-12    3           1234        Milk
1       2017-04-12    3           1234        Milk
1       2014-06-01    14          8901        Eggs
23      2014-06-09    1           4597        Eggs
48      2016-01-25    2           1234        Bread
48      2015-12-09    2           8901        Milk

How do I make a count of unique transactions for each PRODUCT that would output something like this?

PRODUCT    
Milk     :2
Eggs     :2
Bread    :1

I have tried:

cart <- group_by(dataframe, STORE, SLS_DT, REGISTER, TRANS_ID)
summary(cart$PRODUCT)

but it seems that it is ignoring the group_by in the count since it outputs:

PRODUCT
MILK    :3
EGGS    :2
BREAD   :1
Alina
  • 23
  • 4
  • Possible duplicate of [Counting unique / distinct values by group in a data frame](https://stackoverflow.com/questions/12840294/counting-unique-distinct-values-by-group-in-a-data-frame) – IceCreamToucan Jul 06 '18 at 19:15

1 Answers1

1

Use n_distinct to find the number of uniquie transactions

dataframe %>% group_by(PRODUCT) %>% 
  summarize(n=n_distinct(TRANS_ID))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • It's possible that there can be a different transaction with the same transaction id, store, sales date, and product, but be at a different register. Would n_distinct(STORE, SALES_DT, REGISTER, TRANS_ID) work? – Alina Jul 06 '18 at 19:13
  • 1
    @Alina Yes, `n_distinct` can take multiple arguments. – IceCreamToucan Jul 06 '18 at 19:18