2

I have a data frame in which I want to add an index e.g. 1...n for each factor in my data frame. Here is an example with some dummy data.

factor
a        
a         
a         
a        
a        
b        
b        
b        
b        
b
c
c
c
c

I would like to add an additional column which adds an index 1 to n for each factor separately. The resulant data frame would look like:

factor  index
a        1
a        2 
a        3 
a        4
a        5
b        1
b        2
b        3
b        4
b        5 
c        1
c        2
c        3
c        4

Can anyone explain how to do so? Thanks in advance.

989
  • 12,579
  • 5
  • 31
  • 53
ThallyHo
  • 2,667
  • 5
  • 22
  • 19

4 Answers4

15

You could use ave function:

your_data <- data.frame(
     factor=factor(rep(letters[1:3], times = c(5,5,4)))
)
your_data$index <- ave(rep(NA, nrow(your_data)), your_data$factor, FUN=seq_along)
Marek
  • 49,472
  • 15
  • 99
  • 121
3

One way is:

unlist(lapply(split(x, x), seq_along))

where x is your factor as a vector.

R> x <- factor(rep(letters[1:3], times = c(5,5,4))) ## your data
R> data.frame(factor = x, index = unlist(lapply(split(x, x), seq_along), 
+             use.names = FALSE))
   factor index
1       a     1
2       a     2
3       a     3
4       a     4
5       a     5
6       b     1
7       b     2
8       b     3
9       b     4
10      b     5
11      c     1
12      c     2
13      c     3
14      c     4

Another way, on a similar theme is to use table() and seq_len():

unlist(sapply(table(x), seq_len), use.names = FALSE)

And another way is to use the run-length encoding via rle():

R> rle(as.character(x))$lengths
[1] 5 5 4

which we can plug into the sapply() code instead of the table() call:

R> unlist(sapply(rle(as.character(x))$lengths, seq_len), use.names = FALSE)
 [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • This method will failed if `x` is mixed. Try `x<-sample(x)` and run your code. – Marek May 27 '11 at 11:06
  • @Marek Given the OP showed sorted data, I don't think there is anything wrong with my supplied answer(s). Or are we supposed to second guess what the OP really wants now? ;-) Anyway, `x <- sort(sample(x))` would solve the problem :-) – Gavin Simpson May 27 '11 at 11:18
  • Thanks, I used sapply option using the within function and it worked perfectly. Cheers. – ThallyHo May 27 '11 at 13:39
1

Try the following function:

 facSeq <- function(x){
     x.l <-length(x)
     x.f.l <- length(levels(x))
     sapply(1:x.f.l,function(y) cumsum(as.integer(x)%in%y))[1:x.l+x.l*(as.integer(x)-1)]
 }

Testing:

fac1 <- factor(rep(letters[1:3],each=5))

> data.frame(fac1,index=facSeq(fac1))
   fac1 index
1     a     1
2     a     2
3     a     3
4     a     4
5     a     5
6     b     1
7     b     2
8     b     3
9     b     4
10    b     5
11    c     1
12    c     2
13    c     3
14    c     4
15    c     5

More interesting example:

fac2 <- factor(sample(letters[1:5],20,replace=T))

> data.frame(fac2,index=facSeq(fac2))
   fac2 index
1     a     1
2     a     2
3     d     1
4     b     1
5     a     3
6     e     1
7     e     2
8     a     4
9     c     1
10    e     3
11    b     2
12    d     2
13    b     3
14    e     4
15    e     5
16    d     3
17    c     2
18    e     6
19    b     4
20    d     4
James
  • 65,548
  • 14
  • 155
  • 193
0

In base R using sequence and table:

df$index <- sequence(table(df$factor))

   # factor index
# 1       a     1
# 2       a     2
# 3       a     3
# 4       a     4
# 5       a     5
# 6       b     1
# 7       b     2
# 8       b     3
# 9       b     4
# 10      b     5
# 11      c     1
# 12      c     2
# 13      c     3
# 14      c     4

Data

df <- data.frame(factor=factor(rep(letters[1:3], times = c(5,5,4))))
989
  • 12,579
  • 5
  • 31
  • 53