I'd like to index customer transactions in an R dataframe so that I can easily identify, say, the third transaction that a particular customer has made. For example, if I have the following data frame (ordered by customer and transaction date):
transactions = data.frame(CUST.ID = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
DATE = as.Date(c("2009-07-02", "2013-08-15", "2010-01-02", "2004-03-05",
"2006-02-03", "2007-01-01", "2004-03-05", "2006-02-03", "2007-01-01")),
AMOUNT = c(5, 9, 21, 34, 76, 1, 100, 23, 10))
> transactions
CUST.ID DATE AMOUNT
1 1 2009-07-02 5
2 1 2013-08-15 9
3 2 2010-01-02 21
4 2 2004-03-05 34
5 2 2006-02-03 76
6 2 2007-01-01 1
7 3 2004-03-05 100
8 3 2006-02-03 23
9 3 2007-01-01 10
I can clearly see that customer 1 has made 2 transactions, customer 2 has made 4, etc.
What I would like is to index these transactions by customer, creating a new column in my dataframe. The following code achieves what I want:
transactions$COUNTER = 1
transactions$CUSTOMER.TRANS.NO = unlist(aggregate(COUNTER ~ CUST.ID,
data = transactions,
function(x) {rank(x, ties.method = "first")})[, 2])
transactions$COUNTER = NULL
> transactions
CUST.ID DATE AMOUNT CUSTOMER.TRANS.NO
1 1 2009-07-02 5 1
2 1 2013-08-15 9 2
3 2 2010-01-02 21 1
4 2 2004-03-05 34 2
5 2 2006-02-03 76 3
6 2 2007-01-01 1 4
7 3 2004-03-05 100 1
8 3 2006-02-03 23 2
9 3 2007-01-01 10 3
Now the first transaction for each customer is labelled 1, the second 2, etc.
So I've got what I want but it's such a horrible piece of code, creating a list and separating, it's just so ugly. Is anyone with more experience than me able to come up with a better solution?