I just spent some time researching about data.table
in R and was wondering about the conditions under which I can expect the largest performance gains. Maybe the simple answer is when I have a large data.frame and often operate on subsets of this data.frame. When I just load data files and estimate models I can't expect much but many [
operations make the difference. Is that true and the only answer or what else should I consider? When does it start to matter? 10x5, 1,000x5, 1,000,000x5?
Edit: Some of the comments suggest that data.table
is often faster and, equally important, almost never slower. So it would also be good to know when not to use data.table
.