Select first 80 observations for each level in R

Question

I have a data set that looks like this:

structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
    0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1", 
    "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", 
    "t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3", 
    "t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A", 
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
6L), class = "data.frame")

I want to select the first 80 observations of all variables for each TID. So far, I can do this with the first TID only using the code:

sub.data1<-NM[1:80, ]

How can I do it for all my other TIDs?

Thanks!

flodel · Answer 1 · 2013-05-23T19:58:20.647

7

I would do:

lapply(split(dat, dat$TID), head, 80)

It returns a list of data.frames with 80 (or less) rows. If instead you want everything into one data.frame:

do.call(rbind, lapply(split(dat, dat$TID), head, 80))

edited May 23 '13 at 19:58

answered May 23 '13 at 19:46

flodel

87,577
21
185
223

sorry I forgot to mention that I want to retain all the other variables too. – Kaye11 May 23 '13 at 19:51

score 5 · Accepted Answer · edited May 23 '13 at 20:00

5

Using function ddply() from plyr you can split data by TID and then select forst 80 with head() and then put all again in one data frame,

library(plyr)
ddply(NM, .(TID), head, n = 80)

edited May 23 '13 at 20:00

Paul Hiemstra

59,984
12
142
149

answered May 23 '13 at 19:48

Didzis Elferts

95,661
14
264
201

3

+1! Probably there is no need for the lambda function, `ddply(NM, .(TID), head, n = 80)` should work. – Paul Hiemstra May 23 '13 at 19:51

score 3 · Answer 3 · answered May 23 '13 at 20:05

Using data tables, I made a shorter example with just TIDs t1 and t2 that returns the first 2 rows of t1 and t2. It can be adjusted for your data.

library(data.table)
data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
                "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
                "25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
                0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
                        418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
                GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
                        0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A", 
                "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
                6L), class = "data.frame")
dt<-data.table(data)
dt[,head(.SD,2),by=TID]

This results in:

   TID A    T     X     Y V GD ND ND2
1:  t1 1 0.04 464.4 418.5 0  0 NA  NA
2:  t1 1 0.08 464.4 418.5 0  0  0   0
3:  t2 1 0.16 464.4 418.5 0  0  0   0
4:  t2 1 0.20 464.4 418.5 0  0  0   0

and can be changed back to a data frame if desired by changing the last line to

as.data.frame(dt[,head(.SD,2),by=TID])

score 2 · Answer 4 · answered May 23 '13 at 20:14

2

Here is another solution in base:

do.call(rbind, by(NM, NM$TID, head, 80))

answered May 23 '13 at 20:14

Matthew Lundberg

42,009
6
90
112

Select first 80 observations for each level in R

4 Answers4