I'm no data.table expert, but from what I understand its primary advantage is in indexing. So try subsetting with the various packages to compare speeds.
library(microbenchmark)
library(data.table)
mat <- matrix(rnorm(1e7), ncol = 10)
key <- as.character(sample(1:10,1e6,replace=TRUE))
mat2df.base <- data.frame(mat)
mat2df.base$key <- key
bm.before <- microbenchmark(
mat2df.base[mat2df.base$key==2,]
)
library(dataframe)
mat2df.dataframe <- data.frame(mat)
mat2df.dataframe$key <- key
mat2dt <- data.table(mat)
mat2dt$key <- key
setkey(mat2dt,key)
bm.subset <- microbenchmark(
mat2df.base[mat2df.base$key==2,],
mat2df.dataframe[mat2df.dataframe$key==2,],
mat2dt["2",]
)
expr min lq median
uq max
1 mat2df.base[mat2df.base$key == 2, ] 153.99596 154.98602 155.91621 157.0894 194.24456
2 mat2df.dataframe[mat2df.dataframe$key == 2, ] 153.63907 154.66295 155.68553 156.9827 173.76913
3 mat2dt["2", ] 15.51085 15.66742 15.72899 15.8463 22.53044
With a sufficiently large matrix, data.table wipes the table with the other options.
Also, I suspect that @RJ- 's attempt to compare the performance of base data.frame with the package dataframe
's data.frames is not working. The performances are just too similar, and I suspect the results are those of the loaded library not of base.
Edit: Tested. Doesn't seem to make much of a difference. bm.after is the same code as bm.subset above, just run at the same time as bm.before to provide an accurate comparison.
bm.before <- microbenchmark(
mat2df.base[mat2df.base$key==2,]
)
> bm.after
Unit: milliseconds
expr min lq median uq max
1 mat2df.base[mat2df.base$key == 2, ] 160.62708 166.25787 167.52325 169.18710 173.47864
2 mat2df.dataframe[mat2df.dataframe$key == 2, ] 163.30259 166.00588 167.80138 169.24647 174.05713
3 mat2dt["2", ] 16.16117 16.89627 17.09047 17.37057 62.01954
> bm.before
Unit: milliseconds
expr min lq median uq max
1 mat2df.base[mat2df.base$key == 2, ] 159.178 160.9867 162.1149 164.0046 195.9501