0

I have this data frame

    df <- data.frame(id=c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5),
 school=c('school_1','school_2','school_3','school_3','school_4','school_5','school_1','school_1','school_1','school_4','school_6','school_7','school_5','school_5','school_8','school_10','school_10','school_10','school_12','school_13','school_7','school_2','school_2','school_13','school_2'))

and I would like to sequentially order it by id and school. If the school should repeat, I would want the same number used. Once the school changes, the seq should too. Ideal output is below. (I added breaks by id just so it's easier to read)

id  school  seq
1   school_1    1
1   school_2    2
1   school_3    3
1   school_3    3
1   school_4    4
        
2   school_5    1
2   school_1    2
2   school_1    2
2   school_1    2
2   school_4    3
        
3   school_6    1
3   school_7    2
3   school_5    3
3   school_5    3
3   school_8    4
        
4   school_10   1
4   school_10   1
4   school_10   1
4   school_12   2
4   school_13   3
        
5   school_7    1
5   school_2    2
5   school_2    2
5   school_13   3
5   school_2    4

I've tried:

setDT(df)[, sequence := seq_len(.N), by = c("id", "school")]

and per this question

df[, .id := sequence(.N), by = "id,school"]

and neither produced what I wanted. Many of the suggestions don't have numbers repeating if the second variable doesn't change.

tangerine7199
  • 443
  • 2
  • 8
  • 24

2 Answers2

3

You could use data.table:

library(data.table)
setDT(df)
df[, seq := rleid(school), by = id]

df
    id    school seq
 1:  1  school_1   1
 2:  1  school_2   2
 3:  1  school_3   3
 4:  1  school_3   3
 5:  1  school_4   4
 6:  2  school_5   1
 7:  2  school_1   2
 8:  2  school_1   2
 9:  2  school_1   2
10:  2  school_4   3
11:  3  school_6   1
.....
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

You can use match + unique to get unique school number for each id.

This can be done using dplyr

library(dplyr)
df %>% group_by(id) %>% mutate(seq = match(school, unique(school)))

#      id school     seq
#   <dbl> <chr>    <int>
# 1     1 school_1     1
# 2     1 school_2     2
# 3     1 school_3     3
# 4     1 school_3     3
# 5     1 school_4     4
# 6     2 school_5     1
# 7     2 school_1     2
# 8     2 school_1     2
# 9     2 school_1     2
#10     2 school_4     3
# … with 15 more rows

Base R :

df$seq <- with(df, ave(school, id, FUN = function(x) match(x, unique(x))))

and data.table :

library(data.table)
setDT(df)[, seq := match(school, unique(school)), id]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213