how to tell C++ to abandon certain element in vector according to an index indicator

Question

I am using RCPP to speed up the R code in my project. Now what I am doing is to transfer my R code into C++ using Armadillo package. I found I often write multiple lines in C++ to replace one line in R...

Here is my question: I have a vector stored data: Data. Also I have a matrix stored the index of elements I need to access. Please allow me to illustrate my scenario in R first:

> Data
[1] 4 5 6 7 8

And

> index
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    0    0
[3,]    2    0    2

For each row of "index" matrix, I want to get the corresponding elements from the data. In R, I only need to code like this:

> Data[index[1,]]
[1] 4 5 6

> Data[index[2,]]
[1] 4

> Data[index[3,]]
[1] 5 5

i.e. the elements I need from the first row of 'index' matrix is Data[1],Data[2],Data[3]

the elements I need from the 2nd row of 'index' matrix is Data[1]

the elements I need from the 3rd row of 'index' matrix is Data[2] Data[2]

The convenience of R is that R automatically identifies the 0 index as 'nothing' and won't access it.

Now I input the vector 'Data' and matrix 'index' into C now. I was wondering is there any way to achieve the similar result as R above? Thanks a lot!

C or C++? Or both? Your title and intro say C++ but in the final paragraph you mention C. — Jongware, Jan 21 '18 at 21:20
sorry I thought it was similar... IMO I thought C++ is the extension of C that enables objective programming... As long as currently I won't deal with objective programming thus I thought it was the same thing — abc1m2x3c, Jan 22 '18 at 20:42

Sathish · Answer 1 · 2018-01-21T06:22:34.113

Base R

you can subset the data with indices and the result will be a list

Data <- c( 4 ,5 ,6, 7, 8)
index <- matrix(c(1,2,3, 1, 0, 0, 2,0,2), byrow = TRUE, nrow = 3)
apply(index, 1, function(x) Data[x])
# [[1]]
# [1] 4 5 6
# 
# [[2]]
# [1] 4
# 
# [[3]]
# [1] 5 5

The result will be matrix

index <- matrix(c(1,2,3, 1, 0, 0, 2,0,2), byrow = TRUE, nrow = 3)
index[index == 0] <- NA
index
#      [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    1   NA   NA
# [3,]    2   NA    2
apply(index, 2, function(x) Data[x])
#      [,1] [,2] [,3]
# [1,]    4    5    6
# [2,]    4   NA   NA
# [3,]    5   NA    5

Using [:

matrix( Data[index], nrow = 3, byrow = FALSE)   # another way to get the same matrix

Rcpp: for 0 index, use NA in the Data vector

you just convert apply to Rcpp code as described here

or

Using [: Refer this article for more info on vector subseting using RCpp

File: mysubset.cpp

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector mysubset(NumericVector Data, NumericVector index) {
  return Data[index];
}

RStudio:

library('Rcpp')
sourceCpp("mysubset.cpp")
Data <- c( NA, 4 ,5 ,6, 7, 8)  # for 0 index, use NA
index <- matrix(c(1,2,3, 1, 0, 0, 2,0,2), byrow = TRUE, nrow = 3)
matrix( mysubset(Data, index), nrow = 3, byrow = FALSE)
#      [,1] [,2] [,3]
# [1,]    4    5    6
# [2,]    4   NA   NA
# [3,]    5   NA    5

mysubset(Data, index[1,])
# [1] 4 5 6
na.omit(mysubset(Data, index[2,]))
# [1] 4

Got it! I learn a lot about R actually! The last solution you provide me is awesome! — abc1m2x3c, Jan 22 '18 at 20:46

smac89 · Answer 2 · 2018-01-21T20:56:26.017

In C++, there is a bit of work to be done, but it is doable:

#include <type_traits>
#include <vector>
#include <iterator>
#include <algorithm>
#include <iostream>
#include <utility>

template <typename T, typename I,
        typename std::enable_if<std::is_convertible<I,
                typename std::vector<T>::size_type>::value>::type* = nullptr>
std::vector<std::vector<T>> product (const std::vector<T>& data,
                                     const std::vector<std::vector<I>>& index) {

    std::vector<std::vector<T>> result (index.size());
    std::transform(std::begin(index),
                   std::end(index),
                   std::make_move_iterator(std::begin(result)),
                   [&data, &filler](const std::vector<I>& index_row) {
                       std::vector<T> row;
                       for (auto& pos : index_row) {
                           if (pos > 0) {
                               row.push_back(data.at(pos - 1));
                           }
                       }
                       return row;
                   });

    return result;
}

Now a demonstration of how this works:

auto main() -> int {
    std::vector<int> data = {4, 5, 6, 7, 8};
    std::vector<std::vector<int>> index = {
        {1, 2, 3},
        {1, 0, 0},
        {2, 0, 2}
    };

    std::vector<std::vector<int>> result = std::move(product(data, index));
    std::cout << result << "\n";
}

Output

4,5,6,
4,
5,5,

Helper functions used in demonstration:

template <typename T>
std::ostream& operator << (std::ostream& oss, const std::vector<T>& v) {
    for (auto &item : v) {
        oss << item << ",";
    }
    return oss;
}

template <typename T>
std::ostream& operator << (std::ostream& oss, const std::vector<std::vector<T>>& vv) {
    for (auto &v : vv) {
        oss << v << "\n";
    }
    return oss;
}

Thank you so much for the code! Maybe I am a bit far from totally understanding. But it is definitely a good template for me to use in RCPP and learn C++! — abc1m2x3c, Jan 22 '18 at 20:44

Rabbid76 · Answer 3 · 2018-01-21T21:45:16.067

If you want to keep the things easy, then I recommend the following:

Lets assume you have a data vector (Data):

std::vector<int> Data{ 4, 5, 6, 7, 8 };

and an index map, which is row major order vector of column vectors (index):

std::vector<std::vector<int>> index{ {1, 2, 3}, {1, 0, 0}, {2, 0, 2} };

Then the following code, will take the indices of a row of index. Takes the indexed element of Data and appends it to the result vector, except the index is 0 (or out of bounds):

std::vector<int> r;
for (auto i : index[1-1])
    if (i > 0 && i <= Data.size())
        r.push_back(Data[i-1]);

The output of the following code

#include <vector>
#include <iostream>

std::vector<int> Data{ 4, 5, 6, 7, 8 };
std::vector<std::vector<int>> index{ {1, 2, 3}, {1, 0, 0}, {2, 0, 2} };

std::vector<int> r1, r2, r3;
for (auto i : index[1-1]) if (i > 0 && i <= Data.size()) r1.push_back(Data[i-1]);
for (auto i : index[2-1]) if (i > 0 && i <= Data.size()) r2.push_back(Data[i-1]);
for (auto i : index[3-1]) if (i > 0 && i <= Data.size()) r3.push_back(Data[i-1]);

for (auto d : r1) std::cout << d << " ";  std::cout << std::endl;
for (auto d : r2) std::cout << d << " ";  std::cout << std::endl;
for (auto d : r3) std::cout << d << " ";  std::cout << std::endl;

is:

4 5 6
4
5 5

Theoretically you would need an algorithm which does, something like std::transform_if. But that doesn't exist. See Why is there no transform_if in the C++ standard library?

That's the most straightforward C++ code I can understand. Thanks a lot! — abc1m2x3c, Jan 22 '18 at 20:47

how to tell C++ to abandon certain element in vector according to an index indicator

3 Answers3

Output