1

I noticed that unique function from Rcpp orders the results

evalCpp("unique(IntegerVector::create(6,6,1,5,5,1))")
[1] 6 5 1
unique(c(6,6,1,5,5,1))
[1] 6 1 5

Is there a way to avoid this ? Thanks for your help

user3507085
  • 700
  • 5
  • 17

4 Answers4

2

If you look at the (short) source file you see that it uses an internal class IndexHash. I suspect this one sorts by default.

If original order is paramount, I supposed you could write yourself a new convenience wrapper. It cannot be all this hard: at the risk of wasting a few bytes of memory, allocate a temporary logical vector, use a standard hashmap and loop over the incoming vector. For each value, ask if the hashmap has seen this value, store the boolean answer. Then use that to index the original vector.

Chances are this is even implemented somewhere. Also look at Armadillo and Eigen for utility functions.

SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thank you for these details. Actually I realized that the easiest way in my case is to call the R unique function inside the Rcpp program ! – user3507085 Jun 22 '17 at 13:31
1

This is how I've implemented it, along with the issue I was trying to solve when I came up with it (using this answer, and it also shows various other solutions and benchmarks).

  template < typename T, int RTYPE >
  inline SEXP sexp_unique( Rcpp::Vector< RTYPE > x ) {
    std::set< T > seen;
    auto newEnd = std::remove_if( x.begin(), x.end(), [&seen]( const T value ) {
      if ( seen.find( value ) != std::end( seen ) ) {
        return true;
      }
      seen.insert( value );
      return false;
    });
    x.erase( newEnd, x.end() );
    return x;
  }


  // returns unique values in their original input order
  inline SEXP get_sexp_unique( SEXP s ) {

    SEXP s2 = Rcpp::clone( s );

    switch( TYPEOF( s2 ) ) {
    case LGLSXP: {
      return sexp_unique< bool, LGLSXP >( s2 );
    }
    case REALSXP: {
      return sexp_unique< double, REALSXP >( s2 );
    }
    case INTSXP: {
      return sexp_unique< int, INTSXP >( s2 );
    }
    case STRSXP: {
      return sexp_unique< char* , STRSXP >( s2 );
    }
    default: Rcpp::stop("unknown vector type");
    }
    return 0;
  }
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
0

This one might help someone - works only for sorted vectors.

  template <int ITYPE>
  Rcpp::Vector<ITYPE> unique(Rcpp::Vector<ITYPE> x) {
    int n = x.size();
    if (n == 1) return(x);

    Rcpp::Vector<ITYPE> res;
    res.push_back(x(0));


    for (int i = 1; i < n; i++) {
      if (x[i] != x(i - 1)) {
        res.push_back(x(i));
      } 
    }

    return res;
  }


GoGonzo
  • 2,637
  • 1
  • 18
  • 25
  • 1
    This would fail for non-sorted vectors. If you add a call to sort, it would make this algorithm sub optimal complexity. – Joseph Wood Dec 17 '19 at 21:09
0

You may use duplicated(_) == 0 to dedup your vector while keeping the order of first appearance:

IntegerVector x = {6,6,1,5,5,1};
IntegerVector out = x[duplicated(x) == 0];
Rcout << out << std::endl;
zxr6
  • 1
  • 2