8

I have an Orders database for an online shopping platform.

The table I'm working with looks like this, where each line corresponds to one customer/item/date.

OrderHistory <- data.frame(date=c("2015-02-01", "2015-03-01", "2015-04-01", "2015-03-01", "2015-04-01", "2015-05-01", "2015-05-01"), 
            customer=c("A","A","A","B","B","B","B"),
            item=c("Candy", "Coffee", "Coffee", "Candy", "Candy", "Candy", "Coffee" ))

What I would like to get is a running count of the number of times each member has ordered the specific item so I can run analysis on which items are ordered repeatedly by the same customers and which ones are ordered once and never again.

The output would look like

out <- data.frame(date=c("2015-02-01", "2015-03-01", "2015-04-01", "2015-03-01", "2015-04-01", "2015-05-01", "2015-05-01"), 
              member=c("A","A","A","B","B","B","B"),
              item=c("Candy", "Coffee", "Coffee", "Candy", "Candy", "Candy", "Coffee" ),
              count=c(1,1,2,1,2,3,1))

I would love a dplyr solution but I'm open to any suggestions! The exact items on the platform are constantly changing, so the solution would have to be dynamic to account for that.

SFuj
  • 925
  • 1
  • 9
  • 14

2 Answers2

24

I believe this should give you what you want

library(dplyr)
OrderHistory %>%
    group_by(customer, item) %>%
    mutate(count = seq(n()))

Source: local data frame [7 x 4]
Groups: customer, item

        date customer   item count
1 2015-02-01        A  Candy     1
2 2015-03-01        A Coffee     1
3 2015-04-01        A Coffee     2
4 2015-03-01        B  Candy     1
5 2015-04-01        B  Candy     2
6 2015-05-01        B  Candy     3
7 2015-05-01        B Coffee     1
cdeterman
  • 19,630
  • 7
  • 76
  • 100
2

You've gotten your "dplyr" answer, but a straightforward way in base R is to use ave:

ave(rep(1, nrow(OrderHistory)), OrderHistory[-1], FUN = seq_along)
# [1] 1 1 2 1 2 3 1

You can also use getanID from my "splitstackshape" package:

library(splitstackshape)
getanID(OrderHistory, c("customer", "item"))
#          date customer   item .id
# 1: 2015-02-01        A  Candy   1
# 2: 2015-03-01        A Coffee   1
# 3: 2015-04-01        A Coffee   2
# 4: 2015-03-01        B  Candy   1
# 5: 2015-04-01        B  Candy   2
# 6: 2015-05-01        B  Candy   3
# 7: 2015-05-01        B Coffee   1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485