numbering by groups

Question

Suppose we have the following database:

ID  Shoot  hit
1     10    2
1      9    3
1      8    1
2     10    8
2      8    8
2     11   10
2      7    2
3      9    2
4      6    6
4      6    5
.
.

And I would like to have it with numbers assigned in each group, in this case per ID such as:

ID Shoot hit number.in.group
1   10     2    1
1    9     3    2
1    8     1    3
2   10     8    1
2    8     8    2 
2   11    10    3
2    7     2    4
3    9     2    1
4    6     6    1
4    6     5    2
    .
    .

I could do it easily using a loop. Something like these would work:

df$number.in.group = rep(1,nrow(df))

for(i in 2:nrow(df))
    if(df$ID[i]==df$ID[i-1]){
     df$number.in.group[i] = df$number.in.group[i-1] + 1 }

My question is, is there any function or more elegant way of doing this other than using a loop?

We don't generally worry about the dates when marking questions as duplicates. There are more, higher-quality answers on the other question. — zwol, Feb 05 '18 at 16:28

score 7 · Answer 1 · answered Jan 25 '12 at 04:13

7

If you want a one-liner, something like

df$number.in.group = unlist(lapply(table(df$ID),seq.int))

answered Jan 25 '12 at 04:13

Simon Urbanek

13,842
45
45

That's pretty close to the code for `sequence`, no? – joran Jan 25 '12 at 04:18
Well, `sequence(X)` is defined as `unlist(lapply(X,seq_len))` so, yes, you can write it as `sequence(table(df$ID))` - I just prefer to use direct functions and not wrappers - saves time ;) [and fewer functions to remember :P]. – Simon Urbanek Jan 25 '12 at 04:30
You're like Neo; you think in terms of the source code! – joran Jan 25 '12 at 04:34
Believe me, when you're hacking on R, you have to ;) – Simon Urbanek Jan 25 '12 at 04:35
Is there a similar advantage to `unlist(lapply())` over `sapply()`? – Josh O'Brien Jan 25 '12 at 04:35
Not quite, because there is no further processing, so you get the overhead of one extra function call which shouldn't be really noticeable. – Simon Urbanek Jan 25 '12 at 04:38

score 7 · Answer 2 · answered Jan 25 '12 at 04:14

You could just use rle and sequence:

dat <- read.table(text = "ID  Shoot  hit
+ 1     10    2
+ 1      9    3
+ 1      8    1
+ 2     10    8
+ 2      8    8
+ 2     11   10
+ 2      7    2
+ 3      9    2
+ 4      6    6
+ 4      6    5",sep = "",header = TRUE)

> sequence(rle(dat$ID)$lengths)
 [1] 1 2 3 1 2 3 4 1 1 2

Indeed, I think sequence is intended for exactly this purpose.

score 6 · Answer 3 · answered Jan 25 '12 at 05:28

> dat$number.in.group <- ave(dat$ID,dat$ID, FUN=seq_along)
> dat
   ID Shoot hit number.in.group
1   1    10   2               1
2   1     9   3               2
3   1     8   1               3
4   2    10   8               1
5   2     8   8               2
6   2    11  10               3
7   2     7   2               4
8   3     9   2               1
9   4     6   6               1
10  4     6   5               2

score 4 · Accepted Answer · answered Mar 12 '15 at 21:11

4

Using dplyr

dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10))

library(dplyr)
dat %>% group_by(ID) %>%
    mutate(number.in.group = 1:n())

answered Mar 12 '15 at 21:11

Gregor Thomas

136,190
20
167
294

score 2 · Answer 5 · answered Jan 25 '12 at 04:13

There are probably better ways but one could use tapply on the IDs and toss in a function that returns a sequence.

# Example data
dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10))

# Using tapply with a function that returns a sequence
dat$number.in.group <- unlist(tapply(dat$ID, dat$ID, function(x){seq(length(x))}))
dat

which results in

> dat
   ID          val number.in.group
1   1 -0.454652118               1
2   1 -2.391824247               2
3   2  0.530832021               1
4   2 -1.671043812               2
5   2 -0.045261549               3
6   3  2.311162484               1
7   3 -0.525635803               2
8   3  0.008588811               3
9   3  0.078942033               4
10  3  0.324156111               5

score 2 · Answer 6 · answered Jan 25 '12 at 04:18

2

df$number.in.group <- unlist(lapply(as.vector(unlist(rle(df$ID)[1])), function(x) 1:x))

answered Jan 25 '12 at 04:18

Tyler Rinker

108,132
65
322
519

Rats I see joran beat me too the rle solution and more efficiently – Tyler Rinker Jan 25 '12 at 04:18

Ramnath · Answer 7 · 2012-01-26T12:11:47.180

1

Here's another solution

require(plyr)
ddply(dat, .(ID), transform, num_in_grp = seq_along(hit))

edited Jan 26 '12 at 12:11

answered Jan 25 '12 at 20:02

Ramnath

54,439
16
125
152

`val` corresponds to `hit`. see edited answer – Ramnath Jan 26 '12 at 12:12

score 0 · Answer 8 · answered Feb 06 '14 at 21:47

I compared your anwsers and IShouldBuyABoat is the most promissing. I found that function ave could be applied even if dataset is not sorted according to the grouping variable.

Let consider dataset:

dane<-data.frame(g1=c(-1,-2,-2,-2,-3,-3,-3,-3,-3),
             g2=c('reg','pl','reg','woj','woj','reg','woj','woj','woj'))

Joran anwser and applied to my example:

> sequence(rle(as.character(dane$g2))$lengths)
[1] 1 1 1 1 2 1 1 2 3

Simon Urbanek proposition and results:

> unlist(lapply(table(dane$g2),seq.int))
  pl reg1 reg2 reg3 woj1 woj2 woj3 woj4 woj5 
   1    1    2    3    1    2    3    4    5

IShouldBuyABoat code gives correct anwser:

> as.numeric(ave(as.character(dane$g1),as.character(dane$g1),FUN=seq_along))
[1] 1 1 2 3 1 2 3 4 5

numbering by groups

8 Answers8

Linked

Related