1

I have the following data.frame

   x y   
1 t1 5                  
2 t2 2   
3 t2 7  
4 t3 9  
5 t1 6 

how add a column with the occurence number of the value in the first column like below ?:

   x y occ  
1 t1 5 1  
2 t2 2 1  
3 t2 7 2  
4 t3 9 1  
5 t1 6 2  
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
fp4me
  • 463
  • 2
  • 8

2 Answers2

3

Not 100% sure but is this what you mean?

> my.df <- data.frame(x=c("t1","t2","t2","t3","t1"), y=c(5,2,7,9,6))
> my.df <- data.frame(x=my.df$x,
+                     y=my.df$y,
+                     occ=sapply(1:nrow(my.df), function(i) sum(my.df$x[1:i] == my.df$x[i])))

> my.df
   x y occ
1 t1 5   1
2 t2 2   1
3 t2 7   2
4 t3 9   1
5 t1 6   2
fotNelton
  • 3,844
  • 2
  • 24
  • 35
3

Use sequence and rle on your sorted data.frame:

my.df <- data.frame(x=c("t1","t2","t2","t3","t1"), y=c(5,2,7,9,6))
# Order by x
my.df = my.df[order(my.df$x), ]
my.df$occ = sequence(rle(as.vector(my.df$x))$lengths)
my.df
#    x y occ
# 1 t1 5   1
# 5 t1 6   2
# 2 t2 2   1
# 3 t2 7   2
# 4 t3 9   1
# Uncomment if you want to go back to original row order
# my.df[order(rownames(my.df)), ]

Update: Something I learned today

I had seen, but not used the ave function. Looks like you can do this without reordering your original data.frame:

my.df$occ = ave(as.numeric(my.df$x), as.numeric(my.df$x), FUN=seq_along)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Son of pup that's nice. I knew you could use `ave` somehow but was trying `sum` rather than `seq_along`. Nice +1 – Tyler Rinker Jul 29 '12 at 17:41
  • Ahh now I got it you can do it with cumsum too: `ave(rep(1, length(my.df$x)), as.numeric(my.df$x), FUN=cumsum)` – Tyler Rinker Jul 29 '12 at 17:44
  • I was pretty excited about finding it too.... But I have a question: Is it only a "pup" because I pretty much just re-posted an already available solution? ;-) – A5C1D2H2I1M1N2O1R2T1 Jul 29 '12 at 17:49
  • I was trying to keep it PG for the kids. Plus you can shorten it even more to: `ave(rep(1, length(my.df$x)), my.df$x, FUN=cumsum)` – Tyler Rinker Jul 29 '12 at 17:58
  • Nice. I'll look at this (`ave`, at least.... not this problem) more in the morning. Using R near midnight in warm and sticky India doesn't always compute nicely. – A5C1D2H2I1M1N2O1R2T1 Jul 29 '12 at 18:03