5

I have a data set that looks like this:

structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
    GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
    0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1", 
    "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", 
    "t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3", 
    "t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A", 
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
6L), class = "data.frame")

I want to select the first 80 observations of all variables for each TID. So far, I can do this with the first TID only using the code:

sub.data1<-NM[1:80, ]

How can I do it for all my other TIDs?

Thanks!

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
Kaye11
  • 359
  • 5
  • 17

4 Answers4

7

I would do:

lapply(split(dat, dat$TID), head, 80)

It returns a list of data.frames with 80 (or less) rows. If instead you want everything into one data.frame:

do.call(rbind, lapply(split(dat, dat$TID), head, 80))
flodel
  • 87,577
  • 21
  • 185
  • 223
5

Using function ddply() from plyr you can split data by TID and then select forst 80 with head() and then put all again in one data frame,

library(plyr)
ddply(NM, .(TID), head, n = 80)
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
3

Using data tables, I made a shorter example with just TIDs t1 and t2 that returns the first 2 rows of t1 and t2. It can be adjusted for your data.

library(data.table)
data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
                "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
                "25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2, 
                0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5, 
                        418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0), 
                GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA, 
                        0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A", 
                "T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA, 
                6L), class = "data.frame")
dt<-data.table(data)
dt[,head(.SD,2),by=TID]

This results in:

   TID A    T     X     Y V GD ND ND2
1:  t1 1 0.04 464.4 418.5 0  0 NA  NA
2:  t1 1 0.08 464.4 418.5 0  0  0   0
3:  t2 1 0.16 464.4 418.5 0  0  0   0
4:  t2 1 0.20 464.4 418.5 0  0  0   0

and can be changed back to a data frame if desired by changing the last line to

as.data.frame(dt[,head(.SD,2),by=TID])
Docuemada
  • 1,703
  • 2
  • 25
  • 44
2

Here is another solution in base:

do.call(rbind, by(NM, NM$TID, head, 80))
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112