If a faster alternative to table()
is required, including cross-tabulation, collapse::qtab()
, available since v1.8.0 (May 2022) is a faithful and noticeably faster alternative. fcount()
can also be used in the univariate case, and returns a data.frame.
library(collapse) # > v1.8.0, and > 1.9.0 for fcount()
library(microbenchmark)
v = sample(10000, 1e6, TRUE)
microbenchmark(qtab(v, sort = FALSE), fcount(v), tabulate(v), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
qtab(v, sort = FALSE) 1.911707 1.945245 2.002473 1.963654 2.027942 2.207891 10
fcount(v) 1.885549 1.906746 1.978894 1.932310 2.103997 2.138027 10
tabulate(v) 2.321543 2.323716 2.333839 2.328206 2.334499 2.372506 10
v2 = sample(10000, 1e6, TRUE)
microbenchmark(qtab(v, v2), qtab(v, v2, sort = FALSE), table(v, v2), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
qtab(v, v2) 45.61279 51.14840 74.16168 60.7761 72.86385 157.6501 10
qtab(v, v2, sort = FALSE) 41.30812 49.66355 57.02565 51.3568 54.69859 118.1289 10
table(v, v2) 281.60079 282.85273 292.48119 286.0535 288.19253 349.5513 10
That being said, tabulate()
is pretty much as fast as it gets as far as C code is concerned. But it has a clear caveat, which is that it does not hash the values at all, but determines the maximum value and allocates a results vector of that length, using it as a table to count values. Consider this:
v[10] = 1e7L # Adding a random large value here
length(tabulate(v))
[1] 10000000
length(table(v))
[1] 10001
length(qtab(v))
[1] 10001
So you get a results vector with 6.99 million zeros, and your performance deteriorates
microbenchmark(qtab(v, sort = FALSE), fcount(v), tabulate(v), times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
qtab(v, sort = FALSE) 1.873249 1.900473 1.966721 1.923064 2.064186 2.126588 10
fcount(v) 1.829338 1.850330 1.926676 1.880199 2.021013 2.057667 10
tabulate(v) 4.207789 4.357439 5.066296 4.417012 4.558216 10.347744 10
In light of this, the fact that qtab()
actually does hash every value and achieves this performance is rather remarkable.