How to run a loop on different sections of the same data.frame

Question

Suppose I have a data frame with 2 variables which I'm trying to run some basic summary stats on. I would like to run a loop to give me the difference between minimum and maximum seconds values for each unique value of number. My actual data frame is huge and contains many values for 'number' so subsetting and running individually is not a realistic option. Data looks like this:

df <- data.frame(number=c(1,1,1,2,2,2,2,3,3,4,4,4,4,4,4,5,5,5,5),
                 seconds=c(1,4,8,1,5,11,23,1,8,1,9,11,24,44,112,1,34,55,109)) 
     number seconds
1       1       1
2       1       4
3       1       8
4       2       1
5       2       5
6       2      11
7       2      23
8       3       1
9       3       8
10      4       1
11      4       9
12      4      11
13      4      24
14      4      44
15      4     112
16      5       1
17      5      34
18      5      55
19      5     109

my current code only returns the value of the difference between minimum and maximum seconds for the entire data fram:

ZZ <- unique(df$number)
for (i in ZZ){
      Y <- max(df$seconds) - min(df$seconds) 
}

Why do you need a loop? Aggregate might work better here. Or any of the 'do something by something' libraries like dplyr or data.table. — Heroka, Nov 04 '15 at 15:15
Thank you @Heroka. Although the code below does exactly what I want it to this thread should prove useful. — Jojo, Nov 04 '15 at 15:22

R Yoda · Accepted Answer · 2015-11-04T15:25:19.637

3

Since you have a lot of data performance should matter and you should use a data.table instead of a data.frame:

library(data.table)
dt <- as.data.table(df)
dt[, .(spread = (max(seconds) - min(seconds))), by=.(number)]

   number spread
1:      1      7
2:      2     22
3:      3      7
4:      4    111
5:      5    108

edited Nov 04 '15 at 15:25

answered Nov 04 '15 at 15:19

R Yoda

8,358
2
50
87

Perfect and elegant. Thank you very much. – Jojo Nov 04 '15 at 15:21
2

or just `setDT(df)[, max(seconds)-min(seconds), by=number]` – Cath Nov 04 '15 at 15:24
Or just `setDT(df)[, diff(range(seconds)), by = number]` – David Arenburg Nov 04 '15 at 19:46

How to run a loop on different sections of the same data.frame

1 Answers1