-6

I am not a C++ programmer, but this algorithm appeared in an operating manual for a machine I'm using and I'm struggling to make sense of it. I'd like someone to explain some terms within it, or possibly the flow of the entire code, given that I don't have time to learn C in the course of the project.

Waveform files for the machine in question are made up of a number of tags in curly brackets. The checksum is calculated using the WAVEFORM tag, {WAVEFORM-length: #data}.

The "data" consists of a number of bytes represented as hexadecimal numbers. "length" is the number of bytes in the "data", while "start" apparently points to the first byte in the "data".

I've managed to work out some of the terms, but I'm particularly unsure about my interpretation of ((UINT32 *)start)[i]

UINT32 checksum(void *start, UINT32 length)
{
    UINT32 i, result = 0xA50F74FF;
    for(i=0; i < length/4; i++)
        result = result ^ ((UINT32 *)start)[i];
    return(result);
}

So from what I can tell, the code does the following:

  1. Take the address of the first byte in the "data" and the length of the "data"
  2. Create a variable called result, which is an unsigned integer A50F74FF
  3. For each byte in the first 25% of the data string, raise "result" to that power (presumably modulo 2^32)
  4. Return result as the value checksum

Am I correct here or have I misread the algorithm on one of the steps? I feel like I can't be correct, because basing a checksum on only part of the data wouldn't spot errors in the later parts of the data.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
Patrick
  • 1
  • 1
  • And you should probably [get a couple of good books too](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list/388282#388282). The operator `^` is not a power or "raise" operator, it's the bitwise [exclusice or](https://en.wikipedia.org/wiki/Exclusive_or) operation. – Some programmer dude Jul 22 '19 at 13:17
  • 3
    `^` is not *raise to the power*, it's a bitwise exclusive-or. Otherwise you're more or less correct. – john Jul 22 '19 at 13:17
  • You are simply XORing `result` with each *double-word* (32-bits) in `start` and then returning `result`. – David C. Rankin Jul 22 '19 at 13:19
  • Also note that the function is prone to *undefined behavior* because it could possibly break [strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). And it requires the length to be a multiple of 4, otherwise it will not use all the bytes for the checksum. – Some programmer dude Jul 22 '19 at 13:20

2 Answers2

3
  1. For each byte in the first 25% of the data string, raise "result" to that power (presumably modulo 2^32)

This is wrong. ^ is the bitwise XOR operation. It does not raise to a power.

Also, about "of the data string". The algorithm iterates the pointed data as if it is an array of UINT32. In fact, if start doesn't point to (an element of) an array of UINT32, then the behaviour of the program is undefined1. It would be much better to declare the argument to be UINT32* in the first place, and not use the explicit cast.

Also, about "For each byte in the first 25% of the data string", the algorithm appears to go through (nearly2) all bytes from start to start + length. length is presumably measured in bytes, and UINT32 is presumably a type that consists of 4 bytes. Thus an array of UINT32 objects of N bytes contains N/4 elements UINT32 of objects. Note that this assumes that the byte is 8 bits wide which is probably an assumption that the manual can make, but keep in mind that it is not an assumption portable to all systems.

1 UB as far as the C++ language is concerned. But, if it's shown in the operating manual of a machine, then perhaps the special compiler for the particular hardware specifies defined behaviour for this. That said, it is also quite possible for the author of the manual to have made a mistake.

2 If length is not divisible by 4, then the remaining 1-3 bytes are not used.

eerorika
  • 232,697
  • 12
  • 197
  • 326
2

So the pseudocode for this function is roughly like this:

function checksum(DATA)
    RESULT = 0xA50F74FF;
    for each DWORD in DATA do
        RESULT = RESULT xor DWORD
    return RESULT

where DWORD is a four-byte integer value.

The function is actually going though (almost) all of the data (not 25%) but it's doing it in 4-byte increments that's why the length which is in bytes is divided by 4.

r3mus n0x
  • 5,954
  • 1
  • 13
  • 34
  • Thanks for this. It's not quite clear to me, from the original code, why start[0] is the beginning, but start[1] is the fifth byte. Is it that the UINT32 tells it that each address must be 32 bits long, or is there some other reason? – Patrick Jul 22 '19 at 13:44
  • @Patrick, `(UINT32 *)` is telling the compiler to interpret `start` as a pointer to 4-byte integer and since pointers in C can be used as arrays each `((UINT32 *)start)[i]` is i-th 4-byte element in an array that starts at `start`. – r3mus n0x Jul 22 '19 at 13:57