I have done some searching around the Internet and SO looking for an introduction or analysis of what makes data.table
so fast, but I've only found a lot of (very helpful) manuals, no breakdown of what goes into the programming. (I am more or less completely floored that I can't locate a published paper for data.table
, not even something from JStatSoft.)
I've had an algorithms class so I know about sorts and linked lists and binary trees and such, but I don't want to make any amateur guesses (especially when I go to explain to academic people why it's a good idea to use it). Can anyone offer a short, topical summary with references? This question references a slide presentation which is cool, but the info comes in pieces (and even the documentation for, say, setkey()
doesn't cite a data.table
reference, but goes to Wikipedia).
What I am looking for is something that is both not the source code and not a list of Wikipedia topics, but an ideally "official", sourced answer (thus making it canonical, which could help a lot with all the questions orbiting around this topic).
(It would be great if there was a technical paper out there I could cite for this (the citation()
for data.table
is just the manual, but of course it's not directly relevant to the question as far as SO is concerned.)