7

In R I have a data frame with observations described by several values one of which is a factor. I have sorted the dataset by this factor and would like to add a column in which I would get a number of observation on each level of the factor e.g.

factor   obsnum
a        1
a        2
a        3
b        1
b        2
b        3
b        4
c        1
c        2
...

In SAS I do it with something like:

data logs.full;
    set logs.full;
    count + 1;
    by cookie;
    if first.cookie then count = 1;
run;

How can I achieve that in R?

Thanks,

twowo
  • 621
  • 1
  • 8
  • 15

3 Answers3

14

Use rle (run length encoding) and sequence:

x <- c("a", "a", "a", "b", "b", "b", "b", "c", "c")

data.frame(
    x=x,
    obsnum = sequence(rle(x)$lengths) 
)

  x obsnum
1 a      1
2 a      2
3 a      3
4 b      1
5 b      2
6 b      3
7 b      4
8 c      1
9 c      2
Andrie
  • 176,377
  • 47
  • 447
  • 496
5

Here is the ddply() solution

dataset <- data.frame(x = c("a", "a", "a", "b", "b", "b", "b", "c", "c"))
library(plyr)
ddply(dataset, .(x), function(z){
  data.frame(obsnum = seq_along(z$x))
})
Thierry
  • 18,049
  • 5
  • 48
  • 66
3

One solution using base R, assuming your data is in a data.frame named dfr:

dfr$cnt<-do.call(c, lapply(unique(dfr$factor), function(curf){
  seq(sum(dfr$factor==curf))
}))

There are likely better solutions (e.g. employing package plyr and its ddply), but it should work.

Nick Sabbe
  • 11,684
  • 1
  • 43
  • 57