7

I need to compute the Hamming distance between bitsets that are represented as char arrays. This is a core operation, so it must be as fast as possible. I have something like this:

const int N = 32; // 32 always

// returns the number of bits that are ones in a char
int countOnes_uchar8(unsigned char v);

// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
  int ret = 0;
  for(int i = 0; i < N; ++i, ++pa, ++pb)
  {
    ret += countOnes_uchar8(*pa ^ *pb);
  }
  return ret;
}

After profiling, I noticed that operating on ints is faster, so I wrote:

const int N = 32; // 32 always

// returns the number of bits that are ones in a int of 32 bits
int countOnes_int32(unsigned int v);

// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
  const unsigned int *qa = reinterpret_cast<const unsigned int*>(pa);
  const unsigned int *qb = reinterpret_cast<const unsigned int*>(pb);

  int ret = 0;
  for(int i = 0; i < N / sizeof(unsigned int); ++i, ++qa, ++qb)
  {
    ret += countOnes_int32(*qa ^ *qb);
  }
  return ret;
}

Questions

1) Is that cast from unsigned char * to unsigned int * safe?

2) I work on a 32-bit machine, but I would like the code to work on a 64-bit machine. Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?

3) If sizeof(unsigned int) returned 4 in a 64-bit machine, how would I be able to operate on a 64-bit type, with long long?

zentrunix
  • 2,098
  • 12
  • 20
ChronoTrigger
  • 8,459
  • 1
  • 36
  • 57
  • you cannot guarentee the maximum size of unsized int, only the minimum. – OllieB Sep 06 '13 at 13:16
  • 3
    How do you calculate the ones? I found that bitset::count could be faster than my own code on some systems, as it takes advantage of special CPU instruction. – Neil Kirk Sep 06 '13 at 13:17
  • 1
    `std::bitset` should already be optimized for this (and for counting ones). Why reinvent it? – Alan Stokes Sep 06 '13 at 13:18
  • @Neil, with uchar8 I use a look up table, and with int32, one of the tricks of http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan – ChronoTrigger Sep 06 '13 at 13:28
  • @Alan, I could not choose the data type. I guess that converting the data into a bitset in this function may introduce too much overhead. – ChronoTrigger Sep 06 '13 at 13:28
  • @ChronoTrigger: Just a small suggestion - look at pointer aliasing. It might help a lot in certain cases. –  Sep 06 '13 at 13:30
  • 2
    It's just `std::bitset(v).count()` where BITS is 8 for char or whatever. If this is critical code I suggest you try it and profile to see if it helps any. – Neil Kirk Sep 06 '13 at 13:32
  • @ChronoTrigger 1st rule of optimizing is to measure, not guess. – Alan Stokes Sep 06 '13 at 13:42
  • @ChronoTrigger: The bit-twiddling trick will be *much* slower than a special-purpose instruction on platforms (such as modern x86 and ARM) that have such a thing. If speed is important, then use that either directly (via intrinsics or assembly) or indirectly (via `std::bitset`), and see if it's faster. – Mike Seymour Sep 06 '13 at 13:50
  • I tried the bitset count (creating the bitset in the hamming function), but it worked slower than the bit twiddling trick. – ChronoTrigger Sep 07 '13 at 07:30

2 Answers2

11

Is that cast from unsigned char * to unsigned int * safe?

Formally, it gives undefined behaviour. Practically, it will work on just about any platform if the pointer is suitably aligned for unsigned int. On some platforms, it may fail, or perform poorly, if the alignment is wrong.

Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?

It depends. Some platforms have 64-bit int, and some have 32-bit. It would probably make sense to use uint64_t regardless of platform; on a 32-bit platform, you'd effectively be unrolling the loop (processing two 32-bit values per iteration), which might give a modest improvement.

how would I be able to operate on a 64-bit type, with long long?

uint64_t, if you have a C++11 or C99 library. long long is at least 64 bits, but might not exist on a pre-2011 implementation.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • ``*qa ^ *qb`` reads ``int`` from where the pointer to ``char`` points to. If you are accessing an array of bytes the code will fail. – Sergey K. Sep 06 '13 at 13:30
  • "if the pointer is suitably aligned for unsigned int". Does this mean that if I write `qa += 1`, instead of advancing `sizeof(unsigned int)` bytes, it can advance a different amount of bytes? – ChronoTrigger Sep 06 '13 at 13:31
  • @SergeyK.: What do you mean, "fail"? As my answer says, it gives formally undefined behaviour, but will work as required on most platforms given suitable alignment. – Mike Seymour Sep 06 '13 at 13:32
  • @ChronoTrigger: yes, the pointer should be aligned. But if the padding bytes are not 0's the result will be wrong. – Sergey K. Sep 06 '13 at 13:33
  • @ChronoTrigger: I'm not quite sure what you're asking here. `qa += 1` *does* advance by `sizeof(unsigned int)` bytes, since that's the type `qa` points to. – Mike Seymour Sep 06 '13 at 13:33
  • @MikeSeymour: I mean fail==UB. – Sergey K. Sep 06 '13 at 13:33
  • @MikeSeymour: your answer is quite good, however, i'm a kind of disagree with your ``Practically, ...`` part. – Sergey K. Sep 06 '13 at 13:34
  • @SergeyK.: If undefined behaviour isn't acceptable, then just stop reading after the first sentence of the answer. – Mike Seymour Sep 06 '13 at 13:34
  • @ChronoTrigger This is implementation dependent but say a system likes integers to be on addresses that are divisible by the number of bytes, say 4. So an int will be placed on address 0, 4, 8... but never 1. If you have a char pointer to address 8, increase it by 1 to address 9, and then cast this to int pointer, you are forcing an integer read on a non-aligned address. Whether it works and the performance penalty depends on the architecture. If your code was faster than char pointers, I think by chance or fluke your arrays have happened to be int aligned so far. – Neil Kirk Sep 06 '13 at 13:36
  • isn't `char` already implemented as an `unsigned` type ? Isn't `unsigned char` a repetition ? – user2485710 Sep 06 '13 at 13:56
  • @user2485710 No. Whether char is signed or unsigned is implementation defined. – Neil Kirk Sep 06 '13 at 14:00
  • @NeilKirk on what platforms a `char` is signed ? – user2485710 Sep 06 '13 at 14:01
  • @user2485710 In my experience `char` is almost always signed, and there is generally a compiler option to force it one way or the other if not. The only compiler I remember where it was unsigned by default was Watcom. – Retired Ninja Sep 06 '13 at 14:07
  • @user2485710 You should not care. If you care, then add signed/unsigned in front of it. – Neil Kirk Sep 06 '13 at 14:25
  • Is there any way then to safely read those chars (1 byte) as ints (several bytes)? – ChronoTrigger Sep 07 '13 at 07:37
2

1) No, it is not safe/portable, it is undefined behavior. There are systems where char is larger than one byte and there is no guarantee that the char pointer is properly aligned.

2) sizeof(int) might in theory be anything on a 64 bit machine. In practice, it will be either 4 or 8.

3) long long is most likely 64 bits but there are no guarantees there either. If you want guarantees, use uint64_t. However, for your specific algorithm I don't see why the sizeof() the data chunk would matter.

Consider using the types in stdint.h instead, they are far more suitable for portable code. Instead of char, int or long long, use uint_fast8_t. This will let the compiler pick the fastest integer for you, in a portable manner.

As a sidenote, you should consider implementing "countOnes" as a lookup table, working on 4, 8 or 32 bit level, depending on what is most optimal for your system. This will increase program size but reduce execution time. Maybe try to implement some form of adaptive lookup table which depends on sizeof(uint_fast8_t).

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • `char` is never larger than a byte. A byte might be larger than 8 bits, though. – Carl Norum Sep 06 '13 at 14:19
  • @CarlNorum Computer science defines a byte as a collection of 8 bits. Neither the C language nor some obsolete, obscure CPU type can change that. And anything with more bits than 8 is per that definition larger than one byte. – Lundin Sep 06 '13 at 17:13
  • Strictly speaking, none of that is right, either. I'll certainly agree it's nitpicky. http://en.wikipedia.org/wiki/Byte. There are plenty of DSPs that have 16-bit bytes so it's hardly obscure. There's a reason the CHAR_BIT macro exists, and also why the language standard tries hard to avoid the word Byte in general. – Carl Norum Sep 06 '13 at 18:57