3

Best way to explain this is a demonstration.

There is a collection of numbers. They may be repeated, so:

1110, 0100, 0100, 0010, 0110 ...

The number I am looking for is the one that has a bit set, that does not appear in any of the others. The result is the number (in this case 1 - the first number) and the bit position (or the mask is fine) so 1000 (4th bit). There may be more than one solution, but for this purpose it may be greedy.

I can do it by iteration... For each number N, it is:

N & ~(other numbers OR'd together)

But the nature of bits is that there is always a better method if you think outside the box. For instance numbers that appear more than once will never have a unique bit, and have no effect on ORing.

Matt
  • 7,100
  • 3
  • 28
  • 58
  • 2
    Question? If you're asking whether you can do it without iteration, you can't; there'll have to be some kind of iteration because you must look at all the numbers to come up with an answer. – Artefacto Nov 27 '11 at 13:45
  • Well for one you can make it O(n*wordsize) instead of O(n^2); count* the number of times every bit is 1, then find the first count that is one (that will be your bit-index), then find the first number with the bit set. * see this: http://stackoverflow.com/questions/7793997/ – harold Nov 27 '11 at 13:49
  • @Artefacto, But my method above involves two lots of iteration. What ways to make that better are there? – Matt Nov 27 '11 at 13:49
  • @Thomas, yes, 9 bits. Not a nice number but ah well. – Matt Nov 27 '11 at 13:52
  • Good, then O(n*wordsize) is O(n) with a reasonable constant. – harold Nov 27 '11 at 13:53

4 Answers4

5

You just need to record whether each bit has been seen once or more and if it's been seen twice or more. Unique bits are those that have been seen once or more and not twice or more. This can be done efficiently using bitwise operations.

count1 = 0
count2 = 0

for n in numbers:
    count2 |= count1 & n
    count1 |= n

for n in numbers:
    if n & count1 & ~count2:
        return n

If you don't want to iterate over the numbers twice you can keep track of the some number that you've seen that contains each bit. This might be a good optimisation if the numbers are stored on disk and so streaming them requires disk-access, but of course it makes the code a bit more complex.

examples = [-1] * wordsize
count1 = 0
count2 = 0

for n in numbers:
    if n & ~count1:
        for i in xrange(wordsize):
            if n & (1 << i):
                examples[i] = n
    count2 |= count1 & n
    count1 |= n

for i in xrange(wordsize):
    if (count1 & ~count2) & (1 << i):
        return examples[i]

You might use tricks to extract the bit indexes more efficiently in the loop that sets examples, but since this code is executed at most 'wordsize' times, it's probably not worth it.

This code translates easily to C... I just wrote in Python for clarity.

  • If I could mark two replies as answers I would definitely mark this one. However, Harold got in there first, and this is a good improvement on his answer. +1 anyway, so thanks :) – Matt Nov 27 '11 at 15:10
3

(long version of what I wrote in a comment)

By counting the number of times that the bit at index k is one for every k (there is a trick to do this faster than naively, but it's still O(n)), you get a list of bitlength counters in which a count of 1 means that bit was only one once. The index of that counter (found in O(1) because you have a fixed number of bits) is therefore the bit-position you want. To find the number with that bit set, just iterate of all the numbers again and check whether it has that bit set (O(n) again), if it does it's the number you want.

In total: O(n) versus O(n2) of checking every number against all others.

harold
  • 61,398
  • 6
  • 86
  • 164
1

This method uses less than 2 passes (but alters the input array)

    #include <stdio.h>

    unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17 };
    #define COUNTOF(a) (sizeof(a)/sizeof(a)[0])
    void swap(unsigned *a, unsigned *b)
    {
        unsigned tmp;
        tmp = *a;
        *a = *b;
        *b = tmp;
    }

    int main(void)
    {
    unsigned idx,bot,totmask,dupmask;

    /* First pass: shift all elements that introduce new bits into the found[] array.
    ** totmask is a mask of bits that occur once or more
    ** dupmask is a mask of bits that occur twice or more
    */
    totmask=dupmask=0;
     for (idx=bot=0; idx < COUNTOF(array); idx++) {
         dupmask |= array[idx] & totmask;
         if (array[idx] & ~totmask) goto add;
         continue;

    add:
        totmask |= array[idx];
        if (bot != idx) swap(array+bot,array+idx);
        bot++;
        }
    fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", bot, totmask, dupmask );

    /* Second pass: reduce list of candidates by checking if
    ** they consist of *only* duplicate bits */
    for (idx=bot; idx-- > 0 ; ) {
        if ((array[idx] & dupmask) == array[idx]) goto del;
        continue;
    del:
        if (--bot != idx) swap(array+bot,array+idx);

    }

    fprintf(stdout, "Results[%u]:\n", bot );
    for (idx=0; idx < bot; idx++) {
        fprintf(stdout, "[%u]: %x\n" ,idx, array[idx] );
        }
    return 0;
    }  

UPDATE 2011-11-28 Another version, that does not alter the original array. The (temporary) results are kept in a separate array.

#include <stdio.h>
#include <limits.h>
#include <assert.h>

unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17,32,33,64,96,128,130 };
#define COUNTOF(a) (sizeof(a)/sizeof(a)[0])
void swap(unsigned *a, unsigned *b)
{
    unsigned tmp;
    tmp = *a, *a = *b, *b = tmp;
}


int main(void)
{
unsigned idx,nfound,totmask,dupmask;
unsigned found[sizeof array[0] *CHAR_BIT ];

/* First pass: save all elements that introduce new bits to the left
** totmask is a mask of bits that occur once or more
** dupmask is a mask of bits that occur twice or more
*/
totmask=dupmask=0;
 for (idx=nfound=0; idx < COUNTOF(array); idx++) {
     dupmask |= array[idx] & totmask;
     if (array[idx] & ~totmask) goto add;
     continue;

add:
    totmask |= array[idx];
    found[nfound++] = array[idx];
    assert(nfound <= COUNTOF(found) );
    }
fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", nfound, totmask, dupmask );

/* Second pass: reduce list of candidates by checking if
** they consist of *only* duplicate bits */
for (idx=nfound; idx-- > 0 ; ) {
    if ((found[idx] & dupmask) == found[idx]) goto del;
    continue;
del:
    if (--nfound != idx) swap(found+nfound,found+idx);

}

fprintf(stdout, "Results[%u]:\n", nfound );
for (idx=0; idx < nfound; idx++) {
    fprintf(stdout, "[%u]: %x\n" ,idx, found[idx] );
    }
return 0;
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • The idea of storing elements any time a new bit is found is clever. It would make my solution a bit shorter. –  Nov 28 '11 at 22:01
  • Thanks. The driving force behind all this was that I wanted to avoid the second pass. It seems I could not; but at least I managed to reduce it to 32 steps. BTW the gotos are only there to provoke idiots ;-) – wildplasser Nov 29 '11 at 00:15
0

As pointed out this is not working:

You can XOR together the numbers, the result will give you the mask. And then you have to find the first number which doesn't give 0 for the N & mask expression.

akoso
  • 620
  • 4
  • 10