Matlab: fast way to sum ones in binary numbers with Sparse structure?

Question

Most answers only address the already-answered question about Hamming weights but ignore the point about find and dealing with the sparsity. Apparently the answer by Shai here addresses the point about find -- but I am not yet able to verify it. My answer here does not utilise the ingenuity of other answers such as the bitshifting but good enough example answer.

Input

>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;  
>> transpose(dec2bin(find(mlf)))

ans =

001
000
000
011
001
010
101

Goal

Fast calculation for the amount of ones in binary numbers with the sparse structure?

Note sure I understand the question, because it seems so simple: if you are working in binary, `sum` does exactly what you want. — Marc Claesen, Nov 07 '13 at 12:07
@MarcClaesen I cannot understand: this sum(dec2bin(10)) returns 194 instead of 2 where its binary is 1010. So the sum does not sum the amount of ones in binary number. — hhh, Nov 07 '13 at 12:10
@hhh How do you define the binary number: a string? A vector? — Luis Mendo, Nov 07 '13 at 12:11
@LuisMendo I convert a DEC number to binary with `dec2bin` as the above example in the q. There may be some fast way with modulo to achieve my goal of getting the number of active vars. — hhh, Nov 07 '13 at 12:12
possible duplicate of [Calculating Hamming weight efficiently in matlab](http://stackoverflow.com/questions/1024904/calculating-hamming-weight-efficiently-in-matlab) — Rody Oldenhuis, Nov 07 '13 at 12:47
@hhh if I undestand correctly, you are not after the sum of bits in each of the returned indices, but rather sum of set bits in the k-th bit in ALL returned indices. Is that correct? — Shai, Nov 12 '13 at 15:09
@Shai unfortunately cannot understand the last comment. So some examples http://pastie.org/8475305, does it make sense? — hhh, Nov 12 '13 at 17:49
If the line on top of your question is still valid I would recommend making a new question. Otherwise I would remove/change it. — Dennis Jaheruddin, Nov 14 '13 at 16:37

score 6 · Answer 1 · edited May 23 '17 at 11:45

6

You can do this in tons of ways. The simplest I think would be

% Example data
F = [268469248 285213696 536904704 553649152];

% Solution 1
sum(dec2bin(F)-'0',2)

And the fastest (as found here):

% Solution 2
w = uint32(F');

p1 = uint32(1431655765);
p2 = uint32(858993459);
p3 = uint32(252645135);
p4 = uint32(16711935);
p5 = uint32(65535);

w = bitand(bitshift(w, -1), p1) + bitand(w, p1);
w = bitand(bitshift(w, -2), p2) + bitand(w, p2);
w = bitand(bitshift(w, -4), p3) + bitand(w, p3);
w = bitand(bitshift(w, -8), p4) + bitand(w, p4);
w = bitand(bitshift(w,-16), p5) + bitand(w, p5);

edited May 23 '17 at 11:45

Community

1
1

answered Nov 07 '13 at 12:42

Rody Oldenhuis

37,726
7
50
96

I have 31 bits in the above example but can have more than that so the solutino 2 means creating p1 p2 ... p31 for each bit? – hhh Nov 07 '13 at 12:44
@hhh: I can't test if this works for 64-bit integers (R2010a doesn't support addition of 2 `uint64`), so if that applies to your case, it needs some tweaking. – Rody Oldenhuis Nov 07 '13 at 12:50
Yes, the solution 2 looks to be a bit faster than the solution 1 -- cannot yet understand what it does, reading. What do you think about other methods mentioned? – hhh Nov 07 '13 at 12:53
1

@hhh: Shai's lookup table is the most elegant I think, however, it involves 3 divisions, 4 `mod`s and a lookup per element. The solution above does no math other than addition and bit shifting (which is pretty cheap on modern CPUs), so it should be faster. – Rody Oldenhuis Nov 07 '13 at 13:05
Yes! I tested it and your solution 2 is providing most of the time faster result than the other solutions. Interesting! +1 – hhh Nov 07 '13 at 13:07
@RodyOldenhuis: +1 this is indeed the fastest method! I posted a comparison below... Perhaps this could be even faster if implemented as a MEX function (or one of the many other methods listed [here](http://stackoverflow.com/q/109023/97160)) – Amro Nov 07 '13 at 14:30

Luis Mendo · Answer 2 · 2013-11-07T12:31:16.950

5

According to your comments, you convert a vector of numbers to binary string representations using dec2bin. Then you can achieve what you want as follows, where I'm using vector [10 11 12] as an example:

>> sum(dec2bin([10 11 12])=='1',2)

ans =

     2
     3
     2

Or equivalently,

>> sum(dec2bin([10 11 12])-'0',2)

For speed, you could avoid dec2bin like this (uses modulo-2 operations, inspired in dec2bin code):

>> sum(rem(floor(bsxfun(@times, [10 11 12].', pow2(1-N:0))),2),2)

ans =

     2
     3
     2

where N is the maximum number of binary digits you expect.

edited Nov 07 '13 at 12:31

answered Nov 07 '13 at 12:13

Luis Mendo

110,752
13
76
147

+1 but you wont be able to use `nnz` to apply it to each row/column, you'll have to use `sum` – Amro Nov 07 '13 at 12:17
ERR The dec2bin(find(mlf)) returns an array where nnz(dec2bin(find(mlfNew))=='1') sums all ones, not ones per line. – hhh Nov 07 '13 at 12:17

score 4 · Answer 3 · edited May 23 '17 at 12:23

4

If you really want fast, I think a look-up-table would be handy. You can simply map, for 0..255 how many ones they have. Do this once, and then you only need to decompose an int to its bytes look the sum up in the table and add the results - no need to go to strings...

An example:

>> LUT = sum(dec2bin(0:255)-'0',2); % construct the look up table (only once)
>> ii = uint32( find( mlf ) ); % get the numbers
>> vals = LUT( mod( ii, 256 ) + 1 ) + ... % lower bytes
          LUT( mod( ii/256, 256 ) + 1 ) + ...
          LUT( mod( ii/65536, 256 ) + 1 ) + ...
          LUT( mod( ii/16777216, 256 ) + 1 );

Using typecast (as suggested by Amro):

>> vals = sum( reshape(LUT(double(typecast(ii,'uint8'))+1), 4, [] ), 1 )';

Run time comparison

>> ii = uint32(randi(intmax('uint32'),100000,1));
>> tic; vals1 = sum( reshape(LUT(typecast(ii,'uint8')+1), 4, [] ), 1 )'; toc, %//'
>> tic; vals2 = sum(dec2bin(ii)-'0',2); toc
>> dii = double(ii); % type issues
>> tic; vals3 = sum(rem(floor(bsxfun(@times, dii, pow2(1-32:0))),2),2); toc

Results:

Elapsed time is 0.006144 seconds.  <-- this answer
Elapsed time is 0.120216 seconds.  <-- using dec2bin
Elapsed time is 0.118009 seconds.  <-- using rem and bsxfun

edited May 23 '17 at 12:23

Community

1
1

answered Nov 07 '13 at 12:15

Shai

111,146
38
238
371

given the range of the numbers shown, a lookup table would be very space-inefficient.. – Amro Nov 07 '13 at 12:18
Look-up-table? I have never played with them, [here](http://www.mathworks.se/help/images/lookup-table-operations.html)? – hhh Nov 07 '13 at 12:18
3

@Amro 32 bit is indeed too space consuming. Thus my proposal is to split ints into their bytes and use LUT for each byte individually. – Shai Nov 07 '13 at 12:22
1

@Shai: good idea, something like `typecast(uint32(268469248),'uint8')` then apply the LUT to each byte – Amro Nov 07 '13 at 12:25
I am a bit lost, could someone give an example? I am very cautious in playing with cellarrays -- I can understand that playing only with bits is probably far faster. What is LUT? – hhh Nov 07 '13 at 12:26
@hhh LUT is the precomputed lookup table, which is simply a vector [1 2 1 1...], meaning: (decimal) 1 contains 1 binary one; (decimal) 2 contains 1; (decimal) 3 contains 2; (decimal) 4 contains 1... – Luis Mendo Nov 07 '13 at 12:38
1

@Shai: careful there was a small bug in the method I initially suggested using `typecast`; integers arithmetic in MATLAB are saturating, so `uint8(255)+1 == uint8(255)`, which gives the wrong result when applying the LUT.. I fixed it in my answer – Amro Nov 08 '13 at 05:35

score 3 · Answer 4 · edited Jun 20 '20 at 09:12

Here is an example to show @Shai's idea of using a lookup table:

% build lookup table for 8-bit integers
lut = sum(dec2bin(0:255)-'0', 2);

% get indices
idx = find(mlf);

% break indices into 8-bit integers and apply LUT
nbits = lut(double(typecast(uint32(idx),'uint8')) + 1);

% sum number of bits in each
s = sum(reshape(nbits,4,[]))

you might have to switch to uint64 instead if you have really large sparse arrays with large indices outside the 32-bit range..

EDIT:

Here is another solution for you using Java:

idx = find(mlf);
s = arrayfun(@java.lang.Integer.bitCount, idx);

EDIT#2:

Here is yet another solution implemented as C++ MEX function. It relies on std::bitset::count:

bitset_count.cpp

#include "mex.h"
#include <bitset>

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // validate input/output arguments
    if (nrhs != 1) {
        mexErrMsgTxt("One input argument required.");
    }
    if (!mxIsUint32(prhs[0]) || mxIsComplex(prhs[0]) || mxIsSparse(prhs[0])) {
        mexErrMsgTxt("Input must be a 32-bit integer dense matrix.");
    }
    if (nlhs > 1) {
        mexErrMsgTxt("Too many output arguments.");
    }

    // create output array
    mwSize N = mxGetNumberOfElements(prhs[0]);
    plhs[0] = mxCreateDoubleMatrix(N, 1, mxREAL);

    // get pointers to data
    double *counts = mxGetPr(plhs[0]);
    uint32_T *idx = reinterpret_cast<uint32_T*>(mxGetData(prhs[0]));

    // count bits set for each 32-bit integer number
    for(mwSize i=0; i<N; i++) {
        std::bitset<32> bs(idx[i]);
        counts[i] = bs.count();
    }
}

Compile the above function as mex -largeArrayDims bitset_count.cpp, then run it as usual:

idx = find(mlf);
s = bitset_count(uint32(idx))

I decided to compare all the solutions mentioned so far:

function [t,v] = testBitsetCount()
    % random data (uint32 vector)
    x = randi(intmax('uint32'), [1e5,1], 'uint32');

    % build lookup table (done once)
    LUT = sum(dec2bin(0:255,8)-'0', 2);

    % functions to compare
    f = {
        @() bit_twiddling(x)      % bit twiddling method
        @() lookup_table(x,LUT);  % lookup table method
        @() bitset_count(x);      % MEX-function (std::bitset::count)
        @() dec_to_bin(x);        % dec2bin
        @() java_bitcount(x);     % Java Integer.bitCount
    };

    % compare timings and check results are valid
    t = cellfun(@timeit, f, 'UniformOutput',true);
    v = cellfun(@feval, f, 'UniformOutput',false);
    assert(isequal(v{:}));
end

function s = lookup_table(x,LUT)
    s = sum(reshape(LUT(double(typecast(x,'uint8'))+1),4,[]))';
end

function s = dec_to_bin(x)
    s = sum(dec2bin(x,32)-'0', 2);
end

function s = java_bitcount(x)
    s = arrayfun(@java.lang.Integer.bitCount, x);
end

function s = bit_twiddling(x)
    p1 = uint32(1431655765);
    p2 = uint32(858993459);
    p3 = uint32(252645135);
    p4 = uint32(16711935);
    p5 = uint32(65535);

    s = x;
    s = bitand(bitshift(s, -1), p1) + bitand(s, p1);
    s = bitand(bitshift(s, -2), p2) + bitand(s, p2);
    s = bitand(bitshift(s, -4), p3) + bitand(s, p3);
    s = bitand(bitshift(s, -8), p4) + bitand(s, p4);
    s = bitand(bitshift(s,-16), p5) + bitand(s, p5);
end

The times elapsed in seconds:

t = 
    0.0009    % bit twiddling method
    0.0087    % lookup table method
    0.0134    % C++ std::bitset::count
    0.1946    % MATLAB dec2bin
    0.2343    % Java Integer.bitCount

yes that too :) (sorry for the duplication, I just saw you posted the same thing) — Amro, Nov 07 '13 at 12:43
Excellent summary! More accessible for future users, thank you! — hhh, Nov 07 '13 at 22:42

score 0 · Answer 5 · edited Nov 12 '13 at 15:07

0

This gives you the rowsums of the binary numbers from the sparse structure.

>> mlf=sparse([],[],[],2^31+1,1);mlf(1)=10;mlf(10)=111;mlf(77)=1010;  
>> transpose(dec2bin(find(mlf)))

ans =

001
000
000
011
001
010
101

>> sum(ismember(transpose(dec2bin(find(mlf))),'1'),2)

ans =

     1
     0
     0
     2
     1
     1
     2

Hope someone able to find faster rowsummation!

edited Nov 12 '13 at 15:07

Shai

111,146
38
238
371

answered Nov 12 '13 at 12:06

hhh

50,788
62
179
282

@Shai They did not address the point about sparsity, this is not a question just about Hamming weight. Now the next puzzle is to use the methods in other answer to make this even better. I want to make this question distinct from the Hamming weight question by focusing it to many binary numbers like the question provides (original q had them so better be honest to the original indentation, somehow this is getting only a question about a Hamming weight although that was not the original purpose) – hhh Nov 12 '13 at 13:14
the only thing sparse in your question is the origin of the numbers. I do not see any way around the `find(mlf)`. once you have the indices all methods proposed in the answers are faster than `dec2bin`. If you are looking for a method to speed up `find(mlf)` then you'll have to state it explicitly in your question. – Shai Nov 12 '13 at 13:36
@Shai Good point, added it explicitly. Mind you rethink your circulating downvotes or explain. – hhh Nov 12 '13 at 14:04
have you checked my new answer? – Shai Nov 12 '13 at 17:45
@Shai just logged in, reading. Never played with Mex, saving it as a script? – hhh Nov 12 '13 at 17:50

Shai · Answer 6 · 2013-11-12T18:19:39.383

0

Mex it!

Save this code as countTransBits.cpp:

#include "mex.h"

void mexFunction( int nout, mxArray* pout[], int nin, mxArray* pin[] ) {
mxAssert( nin == 1 && mxIsSparse(pin[0]) && mxGetN( pin[0] ) == 1,
            "expecting single sparse column vector input" );
    mxAssert( nout == 1, "expecting single output" );

    // set output, assuming 32 bits, set to 64 if needed
    pout[0] = mxCreateNumericMatrix( 32, 1, mxUINT32_CLASS, mxREAL );
    unsigned int* counter = (unsigned int*)mxGetData( pout[0] );
    for ( int i = 0; i < 32; i++ ) {
        counter[i] = 0;
    }

    // start working
    mwIndex *pIr = mxGetIr( pin[0] );
    mwIndex* pJc = mxGetJc( pin[0] );
    double*  pr = mxGetPr( pin[0] );
    for ( mwSize i = pJc[0]; i < pJc[1]; i++ ) {
        if ( pr[i] != 0 ) {// make sure entry is non-zero
            unsigned int entry = pIr[i] + 1; // cast to unsigned int and add 1 for 1-based indexing in Matlab
            int bit = 0;
            while ( entry != 0 && bit < 32 ) {
                counter[bit] += ( entry & 0x1 ); // count the lsb
                bit++;
                entry >>= 1; // shift right
            }
        }
    }
}

Compile it in Matlab

>> mex -largeArrayDims -O countTransBits.cpp

Run the code

>> countTransBits( mlf )

Note that the output count in 32 bins lsb to msb.

edited Nov 12 '13 at 18:19

answered Nov 12 '13 at 15:39

Shai

111,146
38
238
371

Filename with `>`? Can you explain how this works? I think I did this but the file disapeared. – hhh Nov 12 '13 at 17:55
1

@hhh - mex files are a way of calling C/C++ code from Matlab. what you need is to save the c++ code to `'countTransBits.cpp'`. Then, in matlab (make sure `countTransBits.cpp` is in your CWD) just `>> mex -O -largeArrayDims countTransBits.cpp`. If all goes well (you might need to setup a mex compiler first: `>>mex -setup`) you'll be able to call `countTransBits` as a regular matlab function. – Shai Nov 12 '13 at 18:06
On OSX 10.9 getting this err http://pastie.org/8475365, apparently some platform compability thing, taking some extra time to fix it... – hhh Nov 12 '13 at 18:10
1

@hhh see my edit. also, you might find http://stackoverflow.com/questions/19887677/trying-to-compile-a-c-mex-file-in-matlab relevant? – Shai Nov 12 '13 at 18:21
Updating xcode did not remove the err, asked the earlier guy having the problem how he solved this, waiting for the answer. – hhh Nov 14 '13 at 15:59
@hhh any news on this issue? – Shai Nov 27 '13 at 11:40
Installing and reinstalling Xcode to see whether changing the name of the dirs affect, takes some time, Xcode is large package. http://stackoverflow.com/a/20243924/164148 – hhh Nov 27 '13 at 13:37

score 0 · Answer 7 · answered Aug 27 '15 at 23:35

The bitcount FEX contribution offers a solution based on the lookup table approach, but is better optimized. It runs more than twice as fast as the bit twiddling method (i.e. the fastest pure-MATLAB method reported by Amro) over a 1 million uint32 vector, using R2015a on my old laptop.

Matlab: fast way to sum ones in binary numbers with Sparse structure?

7 Answers7

EDIT:

EDIT#2:

bitset_count.cpp

Linked