6

What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.

for example for this simple dataframe: (but I will work with dataframes with more columns)

mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))

I've found this solution:

cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))   

and here the result:

 A myorder  
 A       1  
 A       2  
 A       3  
 B       1  
 B       2  
 A       4  
 A       5  

Isn't there any single command to do it?. Or using an specialized package?

I want it to later use tidyr's spread() function.

My question is not the same than Is there an aggregate FUN option to count occurrences? because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.

OK, my problem is a little bit more complex

mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))

I only know to solve the first example I wrote above. But what happens when I want it also by a second grouping variable? something like occurrencies(letter) by group.

group letter  "occurencies within group"  
 x      A       1  
 x      A       2  
 x      A       3  
 x      B       1  
 y      B       1  
 y      A       1  
 y      A       2  

I've found the way with

ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.

skan
  • 7,423
  • 14
  • 59
  • 96

1 Answers1

10

Using data.table

library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]

The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.

mydata
   group letter myorder
1:     x      A       1
2:     x      A       2
3:     x      A       3
4:     x      B       1
5:     y      B       1
6:     y      A       1
7:     y      A       2

or a dplyr solution which is pretty much the same

mydata %>% 
  group_by(group, letter) %>% 
  mutate(myorder = 1:n())
Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
  • Good. What if I also want it "by" a second grouping variable? – skan Sep 15 '15 at 14:40
  • @skan `by = list(A, B)`. A common shortcut for writing `list` in `data.table` is dot syntax, i.e. `by = .(A, B)` where `B` is the second variable. Sorry, I don't have a `tidyr` soln. – Akhil Nair Sep 15 '15 at 14:41
  • 1
    Or `mydata[, myorder := 1:.N, by = .(group, letter)]` in the case specified after the edit. – Akhil Nair Sep 15 '15 at 14:58
  • Good, Could you add it to the main answer, please? And if you add a tidyr , dplyr o base solution it would be even better – skan Sep 15 '15 at 16:05
  • Anything with by() or ave() ? – skan Sep 15 '15 at 16:12
  • I don't use `tidyr` unfortunately, so I'm not familiar with the syntax. – Akhil Nair Sep 15 '15 at 16:13
  • 1
    @skan - with `ave()` you could do `transform(mydata, count = ave(rep(1, nrow(mydata)), A, FUN = seq_along))` similar to what you show in your post. Also, `tidyr` is not for this kind of work. It's for data cleaning. – Rich Scriven Sep 15 '15 at 16:19
  • 1
    And on the other data `transform(mydata, count = ave(rep(1, nrow(mydata)), letter, group, FUN = seq_along))` – Rich Scriven Sep 15 '15 at 16:23