Let's generate 100k row data.frame to see which method is fastest.
grade.data = data.frame(Quiz = sample(100000), age = sample(18:24, 100000, replace = TRUE))
library(data.table)
dt.grade.data = as.data.table(grade.data)
The methods posted here
data.table = function(x) dt.grade.data[,sum(Quiz>=100*.45)]
logical.sum = function(x) sum(grade.data$Quiz >= (100*.45))
logical.counting.table = function(x) table(grade.data$Quiz >= (100*.45))[["TRUE"]]
logical.which = function(x) length(which(grade.data$Quiz >= (100*.45)))
subsetting = function(x) nrow(grade.data[grade.data$Quiz >= (100*.45),])
subset.cmd = function(x) nrow(subset(grade.data, Quiz >= (100*.45) ))
Benchmark
microbenchmark(data.table(), logical.sum(), logical.counting.table(), logical.pointless.which(), subsetting(), subset.cmd(), times = 100L)
Unit: microseconds
expr min lq median uq max neval
data.table() 1766.148 2188.8000 2308.267 2469.405 29185.36 100
logical.sum() 739.385 945.4765 993.921 1074.386 10253.67 100
logical.counting.table() 28867.605 30847.0290 31546.796 32725.255 65514.14 100
logical.which() 701.205 1080.9555 1138.635 1228.545 3565.96 100
subsetting() 27376.931 28406.7730 29243.866 30564.371 168034.45 100
subset.cmd() 29004.315 31203.1730 32219.878 33362.003 89801.34 100
Seems that a vectorized logical check is the fastest method. In a smaller data frame (500 rows). data.table is actually much slower than all the other methods.
edit: Apparently, relatively efficiency of logical.sum() and logical.which() depends on the data structure. Using different Quiz score distribution can make the logical.sum() the fastest method. And as expected, data.table selection/subsetting blows data.frame subsetting out of the water.