What's the fastest way to extract non-zero indices from a byte array in C++

Question

I have a byte array

unsigned char* array=new unsigned char[4000000];
 ...

And I would like to get indices of all non-zero elements of the array.

Of course, I can do following

for(int i=0;i<size;i++)
{
    if(array[i]!=0) somevector.push_back(i);
}

Is there any faster algorithm than this?

Update 1 I can see majority answer is no. I hoped that there is some magical bit operations I am not aware of. Some guys suggested sorting but no it's not feasible in this case. But thanks a lot for all your answers.

Update 2 After 4 years and 4 months since this question posted, @wim suggested this answer that looks promising.

Unless you have special constraints, there is no faster way than looking at each element to check whether it's zero. — Daniel Fischer, Sep 22 '12 at 16:33
How can it be faster than O(n)? Without examining each element, you cannot tell. — Vinayak Garg, Sep 22 '12 at 16:34
Well, one thing you can do is to avoid the float-to-int conversion. I.e. compare with 0.f instead of 0 — Johan Kotlinski, Sep 22 '12 at 16:34
@david - integer comparison is almost certainly going to be faster. — D.Shawley, Sep 22 '12 at 16:39
@david, one question. Does it have to be int or could it be a smaller data type such as a byte? A second question is how sparse is the array meaning how many of the array elements do you expect to be zero? Thirdly, does the array start with zero and then elements become non-zero through some decision and can you at that point of the decision mark whether the element is non-zero? I am wondering if you can change your algorithm. — Richard Chambers, Sep 22 '12 at 16:42
@RichardChambers That's good questions. 1. Yes it can be byte, 2. very sparse. 10% or 5% will be non-zero. 3. That's good point. I will see what I can do about it. — Tae-Sung Shin, Sep 22 '12 at 16:44
@David, then I would consider using a byte array to store the value and then doing ULONG comparisons so that you are comparing 4 bytes at a time (32 bit architecture). Then if there is a non-zero, determine which of the bytes are non-zero. This should make a significant difference in speed with 4 byte comparisons at a time. — Richard Chambers, Sep 22 '12 at 16:46
Regarding the whole program, it may be interesting to sort the array. — BenjaminB, Sep 22 '12 at 16:48
in order to do it more faster you can - 1. if you know that there is more non-zero use "if(array[i]!=0)" - it help about the miss-prediction . 2. use "loop unrolling" . — URL87, Sep 22 '12 at 18:12
@david 1. you doing right 2. read http://en.wikipedia.org/wiki/Loop_unwinding - in your case use - if(array[i]!=0) .. if(array[i+1]!=0).. if(array[i+2]!=0) — URL87, Sep 22 '12 at 18:32
@david when I learn about loop unrolling in case of calculate sum of vector I understand that it save 30% of CPU time for each element in the vector . in your case I don't really know . — URL87, Sep 22 '12 at 19:16
The answer is yes if you are on the x86 platform, see my [answer](http://stackoverflow.com/a/41958528/2439725) on [this](http://stackoverflow.com/questions/18971401/sparse-array-compression-using-simd-avx2) SO question. This answer assumes AVX2 support (Intel Haswell or newer), but an SSE solution is also possible. — wim, Feb 02 '17 at 09:36

score 4 · Answer 1 · answered Sep 22 '12 at 16:36

Unless your vector is ordered, this is the most efficient algorithm to perform what you want to do if you are using a mono-thread program. You can try to optimize the data structure where you want to store your result, but in time this is the best you can do.

Vaughn Cato · Answer 2 · 2012-09-23T05:08:30.917

If the non-zero values are relatively rare, one trick you can use is a sentinel value:

unsigned char old_value = array[size-1];
array[size-1] = 1; // make sure we find a non-zero eventually

int i=0;

for (;;) {
  while (array[i]==0) ++i; // tighter loop
  if (i==size-1) break;
  somevector.push_back(i);
  ++i;
}

array[size-1] = old_value;
if (old_value!=0) {
  somevector.push_back(size-1);
}

This avoids having to check both the index and the value on each iteration.

score 1 · Accepted Answer · answered Sep 23 '12 at 17:06

With a byte array that is mostly zero, being a sparse array, you can take advantage of a 32 bit CPU by doing comparisons 4 bytes at a time. The actual comparisons are done 4 bytes at a time however if any of the bytes are non-zero then you have to determine which of the bytes in the unsigned long are non-zero so that will take more effort. If the array is really sparse then the time saved with the comparisons may compensate for the additional work determining which of the bytes are non-zero.

The easiest would be to make the unsigned char array sized to some multiple of 4 bytes so that you do not need to worry about doing the last few bytes after the loop completes.

I would suggest doing a timing study on this as it is purely conjectural and there would be a point where an array becomes un-sparse enough that this would take more time than a simple loop.

One question that I would have is what are you doing with the vector of offsets of non-zero elements of the array and whether you can do away with the vector. Another question is if you need the vector whether you can build the vector as you place elements into the array.

unsigned char* array=new unsigned char[4000000];
......
unsigned long *pUlaw = (unsigned long *)array;

for ( ; pUlaw < array + 4000000; pUlaw++) {
    if (*pUlaw) {
        // at least one byte is non-zero
        unsigned char *pUlawByte = (unsigned char *)pUlaw;
        if (*pUlawByte)
            somevector.push_back(pUlawByte - array);
        if (*(pUlawByte+1))
            somevector.push_back(pUlawByte - array + 1);
        if (*(pUlawByte+2))
            somevector.push_back(pUlawByte - array + 2);
        if (*(pUlawByte+3))
            somevector.push_back(pUlawByte - array + 3);
    }
}

score 0 · Answer 4 · answered Sep 22 '12 at 16:34

0

The only thing you can do to improve the speed is to use concurrency.

answered Sep 22 '12 at 16:34

Puppy

144,682
38
256
465

1

Likely that will not help so much - the bottleneck here would be RAM-CPU traffic – Johan Kotlinski Sep 22 '12 at 16:36
yepp, the data already has to be in the cache, otherwise it's going to be slower. – Karoly Horvath Sep 22 '12 at 16:37
1

Not on multiple-channel boards, where two or more cores can stream from RAM independently. If you had dual or triple channel, you could still double or triple (ignoring the costs of syncing the results) your speed. – Puppy Sep 22 '12 at 17:08

score 0 · Answer 5 · answered Sep 22 '12 at 16:42

This is not really an answer to your question, but I was trying to imagine what problem you are trying to solve.

Sometimes when performing operations on matrices (in mathematical sense), the operations can be improved when you know that the great majority of matrix elements will be zeros (a sparse matrix). You do such an optimization by not using a big array at all, but simply storing pairs {index, value} that indicate a non-zero element.

What's the fastest way to extract non-zero indices from a byte array in C++

5 Answers5