2

In his "Advanced R" book, Hadley Wickham says "noNA(x) asserts that the vector x does not contain any missing values." However I still don't know how to use it. I can't do

if (noNA(x))
    do this

so how am I supposed to use it?

http://adv-r.had.co.nz/Rcpp.html#rcpp-sugar

Rory Nolan
  • 972
  • 10
  • 15
  • 1
    Please provide the exact reference, i.e., a link. – Roland Jan 26 '17 at 12:35
  • 1
    There's an example on page 14 of [this](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-introduction.pdf) – Miff Jan 26 '17 at 12:40
  • This is essentially a) a FAQ (with the canonical answer [here in this *epic* post by Kevin](http://stackoverflow.com/questions/26241085/rcpp-function-check-if-missing-value/26262984#26262984) and b) your own making if you trust secondary documentation over primary. – Dirk Eddelbuettel Jan 26 '17 at 13:58
  • I couldn't understand the primary documentation. Hadley's book was the closest I got to something I could understand. Kevin's answer is informative, indeed I have seen it before, however I did not learn from it how `noNA` is used. – Rory Nolan Jan 26 '17 at 14:14
  • @DirkEddelbuettel To be fair, the first Google hit that looks like a documentation [is, in fact, completely undocumented](http://dirk.eddelbuettel.com/code/rcpp/html/namespaceRcpp.html#ac6caefefde6b53b704f65b0716d35c20). In fact, I’m not sure which documentation you’re referring to here. – Konrad Rudolph Jan 26 '17 at 15:37
  • 1
    @Konrad: Contributions are welcome on documentation too. But you can search in the same doxygen tree, or [directly on GitHub](https://github.com/RcppCore/Rcpp/search?utf8=%E2%9C%93&q=noNA) and that *does* give usage examples. – Dirk Eddelbuettel Jan 26 '17 at 22:22

1 Answers1

4

Many of the Rcpp sugar expressions are implemented through template classes which have specializations for cases when the input object is known to be free of missing values, thereby allowing the underlying algorithm to avoid having to perform the extra work of dealing with NA values (e.g. calls to is_na). This is only possible because the VectorBase class has a boolean parameter indicating whether the underlying object can (can, not that it necessarily does) have NA values, or not.

noNA returns (when called on a VectorBase object) an instance of the Nona template class. Note that Nona itself derives from

Rcpp::VectorBase<RTYPE, false, Nona<RTYPE,NA,VECTOR>>
//                      ^^^^^

meaning that the returned object gets encoded with information that essentially says "you can assume that this data is free of NA values".

As an example, Rcpp::sum is implemented via the Sum class in the Rcpp::sugar namespace. In the default case, we see that there is extra work to manage the possibility of missing values:

STORAGE get() const {
    STORAGE result = 0 ;
    R_xlen_t n = object.size() ;
    STORAGE current ;
    for( R_xlen_t i=0; i<n; i++){
        current = object[i] ;
        if( Rcpp::traits::is_na<RTYPE>(current) )   // here
            return Rcpp::traits::get_na<RTYPE>() ;  // here
        result += current ;
    }
    return result ;
}

On the other hand, there is also a specialization for cases when the input does not have missing values, in which the algorithm does less work:

STORAGE get() const {
    STORAGE result = 0 ;
    R_xlen_t n = object.size() ;
    for( R_xlen_t i=0; i<n; i++){
        result += object[i] ;
    }
    return result ;
}

To answer your question of "how do I apply this in practice?", here is an example:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
int Sum(IntegerVector x) {
    return sum(x);
}

// [[Rcpp::export]]
int SumNoNA(IntegerVector x) {
    return sum(noNA(x));
}

Benchmarking these two functions,

set.seed(123)
x <- as.integer(rpois(1e6, 25))

all.equal(Sum(x), SumNoNA(x))
# [1] TRUE

microbenchmark::microbenchmark(
    Sum(x), 
    SumNoNA(x),
    times = 500L
)
# Unit: microseconds
#        expr     min      lq     mean   median       uq      max neval
#      Sum(x) 577.386 664.620 701.2422 677.1640 731.7090 1214.447   500
#  SumNoNA(x) 454.990 517.709 556.5783 535.1935 582.7065 1138.426   500

the noNA version is indeed faster.

nrussell
  • 18,382
  • 4
  • 47
  • 60