2

I have nested data of students in classes in schools.

If I have student number from 1... nth, classnumber from 1... nth and schoolnumber from 1... nth, how would I create a new column to count sequentially how many students are in each class. The new column would appear 1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,1,2,1,2,3... nth. At each new classnumber the count begins again from 1.

At the minute I have gone a very long way round. I have table(classnr) which gives me the number of pupils in each class. Then I have mydata$pupilinclass <- c(1:25, 1:7, 1:5, 1:15... For a large dataset this is a lot of lines.

There must be a quicker way of doing this - can anyone help?

Rachel
  • 47
  • 5
  • It may be useful to have a reproducible example as I used `classNumber` and `SchoolNumber` as grouping variable. You may remove `SchoolNumber` from the code below if it is not required. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Apr 17 '15 at 15:11

2 Answers2

2

Try

 mydata$Sequence <- with(mydata, ave(seq_along(studentID), classNumber,
                     SchoolNumber, FUN=seq_along))

Or for a quicker option

library(data.table)
setDT(mydata)[, grp := 1:.N, by = list(ClassNumber, SchoolNumber)]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Using the dplyr package, you could do:

library(dplyr)

mydata = mydata %>% group_by(ClassNumber, SchoolNumber) %>%
             mutate(Sequence=1:n())
eipi10
  • 91,525
  • 24
  • 209
  • 285