I was doing some micro-optimization today for a related problem: checking if a numeric vector is empty (e.g. equivalent to numeric(0)
) when it can either be empty or have a value (it is never NA
or NULL
). In my problem the check occurs hundreds of millions of time, so it seemed reasonable to benchmark the right approach. Length benchmarks quite a bit better than other options:
vec = numeric(0)
bench::mark(
x = { !length(vec) },
y = { rlang::is_empty(vec) },
z = { identical(vec, numeric(0)) },
check = FALSE,
min_time = 5,
min_iterations = 100000,
max_iterations = 100000
)
# A tibble: 3 x 6
expression min median `itr/sec` `gc/sec` n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <dbl> <dbl>
1 x 200ns 300ns 3621037. 0 0
2 y 5.2us 5.8us 166594. 8.33 5
3 z 1.3us 1.5us 618090. 12.4 2
Length checking beating identical checking by 6x and by is_empty by 4x over that. The results for the case where the vector is non-empty are similar, so irrespective of the distribution of your data, just use length.
I am cognizant that there are probably edge cases where the behaviours of these three functions aren't identical, but if like me it's just a matter of a value being either c(some, number)
or numeric(0)
and you want to quickly check which, use length
.