Is there an efficient approach to only retain rows of an Armadillo sparse matrix that sum up to at least some level of total count across columns of the matrix? For instance, I would want to retain the i
th row, if the sum of its values is >=C
, where C
is some chosen value. Armadillo's documentation says that only contiguous submatrix views are allowed with sparse matrices. So I am guessing this is not easily obtainable by sub-setting. Is there an alternative to plainly looping through elements and creating a new sparse matrix with new locations, values and colPtr settings that match the desired condition? Thanks!
Asked
Active
Viewed 1,096 times
2

mskb
- 341
- 3
- 12
-
This is unclear: _entertain at least some level of total count across columns_. Can you try to rephrase this in standard terminology? Or give examples of what you mean? – Svaberg Mar 26 '17 at 03:15
-
Sorry about that; I hope the edited version is clearer. – mskb Mar 26 '17 at 15:17
1 Answers
2
It may well be that the fastest executing solution is the one you propose. If you want to take advantage of high-level armadillo functionality (i.e. faster to code but perhaps slower to run) you can build a std::vector
of "bad" rows ids and then use shed_row(id)
. Take care with the indexing when shedding rows. This is accomplished here by always shedding from the bottom of the matrix.
auto mat = arma::sp_mat(rowind, colptr, values, n_rows, n_cols)
auto threshold_value = 0.01 * arma::accu(sp_mat); // Sum of all elements
std::vector<arma::uword> bad_ids; // The rows that we want to shed
auto row_sums = arma::sum(mat); // Row sums
// Iterate over rows in reverse order.
for (const arma::uword row_id = mat.nrows; i-- > 0; ) {
if (row_sum(row_id) < threshold_value) {
bad_ids.push_back(row_id);
}
}
// Shed the bad rows from the bottom of the matrix and up.
for (const auto &bad_id : bad_ids) {
matrix.shed_row(bad_id);
}

Svaberg
- 1,501
- 1
- 19
- 40
-
2@mskb - this may not work as each time `.shed_row()` is called, the number and hence the index of rows changes. the loop could be removing the wrong rows. – hbrerkere Mar 27 '17 at 01:23
-
@hbrerkere That is definitely a bug in the code I wrote! I changed it to add the row IDs in reverse order so that the rows are always shed from the bottom of the matrix. Then it should be stable. – Svaberg Mar 27 '17 at 03:02
-
2it looks like there is another bug - you may want to change the `uint` to a regular (signed) integer. `row_id` will never become negative, as unsigned integers are always positive. `row_id--` will simply wrap around to a large integer. so the condition `row_id >= 0` will always be true. (also, i don't think `uint` exists in the `arma` namespace). there is better way to do this with unsigned integers, but it's beyond the scope of this question. – hbrerkere Mar 27 '17 at 10:11
-
1@hbrerkere Darn. Thanks! I wanted to use `arma::uword` as it is an armadillo `typedef` analogous to `size_t`. On the good side now I can use [the `-->` operator](http://stackoverflow.com/questions/1642028/what-is-the-operator-in-c) :-). See also http://stackoverflow.com/questions/4205720/iterating-over-a-vector-in-reverse-direction – Svaberg Mar 27 '17 at 10:29