1

Ok, so here is the problem.

I have a dataset that lists the activity (of various types) associated with various ID's at various times. The dataset is actually a few tens of thousands of rows long and looks like this

      ID        DATE_EVENT TIME_EVENT EVENT_TYPE
 1:   520424473 07/08/2014   09:28:16      9,210      
 2:   504344215 07/08/2014   09:10:27      1,000    
 3:   051745297 07/08/2014   09:40:16      1,000    
 4:   961837100 07/08/2014   09:44:13      1,000     
 5:   412980113 07/08/2014   09:40:59      1,000
 6:   051745297 07/08/2014   09:40:23      9,034
 7:   520424473 07/08/2014   09:28:22      1,000

What I would like to be able to do is to to group up things by ID, then order them chronologically and then do statistics on how long was spent in each EVENT_TYPE across the whole data set, (or even better in a range of EVENT_TYPES). I have used this before

library(data.table)
setDT(Allvol)[, list(mean = mean(volume, na.rm = T), 
                     sd = sd(volume, na.rm = T)), by = ID]

on some data previously in order to group data by the ID and then work out the mean and s.d for each one, however that dataset was slightly different and I had a column for volumes associated with EVENT_TYPES. I think I need something similar but am not sure how to approach this.

Any help is much appreciated!

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Taylrl
  • 3,601
  • 6
  • 33
  • 44
  • 1
    You will probably get more answers if your post includes a reproducible example. See: http://stackoverflow.com/questions/5963269/ – Ernest A Aug 13 '14 at 13:44
  • 2
    Looks like you have a good start. If you want to see the data at different times, you could try `Allvol[ , list(mean = mean(volume, na.rm = T), sd=sd(volume, na.rm = T)), by = list(ID,Time=cut(Time,5))]` – Mike.Gahan Aug 13 '14 at 14:43

1 Answers1

0

You have not provided with data but following may be helpful:

volume = sample(1000:2000,100)
id = sample(1:10,100, replace=T)
allvol = data.frame(id, volume)

head(allvol)
  id volume
1  5   1946
2  6   1828
3  5   1851
4  6   1296
5  5   1285
6  8   1238

means = with(allvol, tapply(volume, id, mean))
sds = with(allvol, tapply(volume, id, sd))

outdf = data.frame(id=names(means), means, sds)

outdf
   id    means      sds
1   1 1566.000 397.5433
2   2 1504.818 368.3938
3   3 1660.600 328.4202
4   4 1518.308 265.1347
5   5 1482.000 309.9055
6   6 1342.800 281.8632
7   7 1555.444 232.2246
8   8 1556.667 286.3241
9   9 1588.500 283.5166
10 10 1505.867 348.3440
rnso
  • 23,686
  • 25
  • 112
  • 234