I need to compute the Hamming distance between bitsets that are represented as char
arrays. This is a core operation, so it must be as fast as possible. I have something like this:
const int N = 32; // 32 always
// returns the number of bits that are ones in a char
int countOnes_uchar8(unsigned char v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
int ret = 0;
for(int i = 0; i < N; ++i, ++pa, ++pb)
{
ret += countOnes_uchar8(*pa ^ *pb);
}
return ret;
}
After profiling, I noticed that operating on int
s is faster, so I wrote:
const int N = 32; // 32 always
// returns the number of bits that are ones in a int of 32 bits
int countOnes_int32(unsigned int v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
const unsigned int *qa = reinterpret_cast<const unsigned int*>(pa);
const unsigned int *qb = reinterpret_cast<const unsigned int*>(pb);
int ret = 0;
for(int i = 0; i < N / sizeof(unsigned int); ++i, ++qa, ++qb)
{
ret += countOnes_int32(*qa ^ *qb);
}
return ret;
}
Questions
1) Is that cast from unsigned char *
to unsigned int *
safe?
2) I work on a 32-bit machine, but I would like the code to work on a 64-bit machine. Does sizeof(unsigned int)
returns 4 in both machines, or is it 8 on a 64-bit one?
3) If sizeof(unsigned int)
returned 4 in a 64-bit machine, how would I be able to operate on a 64-bit type, with long long
?