In his "Advanced R" book, Hadley Wickham says "noNA(x)
asserts that the vector x
does not contain any missing values." However I still don't know how to use it. I can't do
if (noNA(x))
do this
so how am I supposed to use it?
In his "Advanced R" book, Hadley Wickham says "noNA(x)
asserts that the vector x
does not contain any missing values." However I still don't know how to use it. I can't do
if (noNA(x))
do this
so how am I supposed to use it?
Many of the Rcpp sugar expressions are implemented through template classes which have specializations for cases when the input object is known to be free of missing values, thereby allowing the underlying algorithm to avoid having to perform the extra work of dealing with NA
values (e.g. calls to is_na
). This is only possible because the VectorBase
class has a boolean parameter indicating whether the underlying object can (can, not that it necessarily does) have NA
values, or not.
noNA
returns (when called on a VectorBase
object) an instance of the Nona
template class. Note that Nona
itself derives from
Rcpp::VectorBase<RTYPE, false, Nona<RTYPE,NA,VECTOR>>
// ^^^^^
meaning that the returned object gets encoded with information that essentially says "you can assume that this data is free of NA
values".
As an example, Rcpp::sum
is implemented via the Sum
class in the Rcpp::sugar
namespace. In the default case, we see that there is extra work to manage the possibility of missing values:
STORAGE get() const {
STORAGE result = 0 ;
R_xlen_t n = object.size() ;
STORAGE current ;
for( R_xlen_t i=0; i<n; i++){
current = object[i] ;
if( Rcpp::traits::is_na<RTYPE>(current) ) // here
return Rcpp::traits::get_na<RTYPE>() ; // here
result += current ;
}
return result ;
}
On the other hand, there is also a specialization for cases when the input does not have missing values, in which the algorithm does less work:
STORAGE get() const {
STORAGE result = 0 ;
R_xlen_t n = object.size() ;
for( R_xlen_t i=0; i<n; i++){
result += object[i] ;
}
return result ;
}
To answer your question of "how do I apply this in practice?", here is an example:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int Sum(IntegerVector x) {
return sum(x);
}
// [[Rcpp::export]]
int SumNoNA(IntegerVector x) {
return sum(noNA(x));
}
Benchmarking these two functions,
set.seed(123)
x <- as.integer(rpois(1e6, 25))
all.equal(Sum(x), SumNoNA(x))
# [1] TRUE
microbenchmark::microbenchmark(
Sum(x),
SumNoNA(x),
times = 500L
)
# Unit: microseconds
# expr min lq mean median uq max neval
# Sum(x) 577.386 664.620 701.2422 677.1640 731.7090 1214.447 500
# SumNoNA(x) 454.990 517.709 556.5783 535.1935 582.7065 1138.426 500
the noNA
version is indeed faster.