0

I have a df sorted by product id prodID and by date Date. I need to add a column that shows a kind-of cumulative index of how many times each prodID is in the df. For example: if a proID appears only once, the index will be 1 on that row. If another prodID appears in 3 rows (which are consecutive in the df, for the df is sorted), then the index should be 1 in the first row for that very prodID, then 2, then 3 in the following rows. Basically I need that my initial df:

 initial.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"),     rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83,     3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29,     38.33, 43.3, 39.53, 30.58)), class = "data.frame", row.names = c(NA, -12L), .Names = c("prodID", "Date", "status", "rating", "over"))

becomes

new.df <- structure(list(prodID = c("009hpOpzwl", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00An0zNeEQ", "00DtU3Bk6O", "00DtU3Bk6O", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk", "00FyjrH1kk"), Date = c("2012-06", "2014-09", "2014-09", "2014-09", "2014-09", "2001-11", "2001-11", "2002-11", "2002-12", "2003-01", "2003-02", "2003-03"), status = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 5L, 5L), .Label = c("rare", "occasional", "amateur", "connoisseur", "expert", "fool"), class = "factor"),     rating = c(2.5, 4.7, 4.7, 4.7, 4.7, 4.4, 4.4, 3.5, 3.83,     3.36, 3.53, 3.78), over = c(68, 49, 49, 49, 49, 22, 22, 29,     38.33, 43.3, 39.53, 30.58), index = c(1, 1, 2, 3, 4, 1, 2,     1, 2, 3, 4, 5)), .Names = c("prodID", "Date", "status", "rating", "over", "index"), row.names = c(NA, -12L), class = "data.frame")

Thank you in advance for any suggestion

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Sal
  • 117
  • 12

4 Answers4

3

You can use the ave function to acheive this, if the data are guaranteed to be sorted as you claim:

initial.df$index <- ave(initial.df$prodID, initial.df$prodID, FUN=function(x) seq(along=x))
Gavin Kelly
  • 2,374
  • 1
  • 10
  • 13
3

Just for completeness, this is very straightforward operation when using data.table and will be both efficient, short syntax and creates the column by reference, simply:

library(data.table)
setDT(initial.df)[, index := seq_len(.N), prodID]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
2

If this question is not being closed as a duplicate of some other and we already have the data.table answer, here's the dplyr edition:

library(dplyr)
df %>% group_by(prodID) %>% mutate(index = row_number())
talat
  • 68,970
  • 21
  • 126
  • 157
1

What about

do.call(rbind, lapply(split(initial.df, initial.df$prodID), function(x) cbind(x, 1:nrow(x))))
johannes
  • 14,043
  • 5
  • 40
  • 51