4

I am using pySerial to read in data from an attached device. I want to calculate the checksum of each received packet. The packet is read in as a char array, with the actual checksum being the very last byte, at the end of the packet. To calculate the checksum, I would normally sum over the packet payload, and then compare it to the actual checksum.

Normally in a language like C, we would expect overflow, because the checksum itself is only one byte. I'm not sure about the internals of python, but from my experience with the language it looks like it will default to a larger size variable (maybe some internal bigInt class or something). Is there anyway to mimic the expected behavior of adding two chars, without writing my own implementation? Thanks.

3 Answers3

8

Sure, just take the modulus of your result to fit it back in the size you want. You can do the modulus at the end or at every step. For example:

>>> payload = [100, 101, 102, 103, 104] # arbitrary sequence of bytes
>>> sum(payload) % 256 # modulo 256 to make the answer fit in a single byte
254 # this would be your checksum
Kiv
  • 31,940
  • 6
  • 44
  • 59
3

to improve upon the earlier example, just bitwise-and it with 0xFF. not sure if python does the optimization by default or not.

sum(bytes) & 0xFF
nategood
  • 11,807
  • 4
  • 36
  • 44
  • I'm not sure what Python does under the hood, but a quick test indicates that both methods take almost the same amount of time. The summation is going to be the bulk of the work in any case. – Kiv Jun 10 '09 at 23:53
  • Most decent optimizing compilers will do a strength reduction and implement your "% (2^n)" as an "& ((1 << n)-1)". Wouldn't be too hard for the Python interpreter to do the same. – Matt J Jun 11 '09 at 00:18
  • 1
    Looking at the output of "dis.dis(lambda x:x%256)", it looks like this optimisation isn't applied. The small difference in runtime is probably due to the fact that the actual and/mod operation is a very tiny part of the total cost. – Brian Jun 11 '09 at 08:34
  • 2
    are you people seriously worried about the performance of a math operation, when your data is coming through a serial port ???? ;) – JimB Jun 11 '09 at 13:39
  • The CPython interpreter executes hundreds of instructions per bytecode instruction. `&` and `%` result in the same number of bytecode instructions. Their corresponding CPU instruction takes up no time, compared to all the overhead. [Related.](https://stackoverflow.com/questions/1396564/why-is-subtraction-faster-than-addition-in-python) (Besides, `&` and `%` would probably execute in the same clock cycle on a modern processor anyways.) – Mateen Ulhaq Mar 24 '19 at 02:58
1

Summing the bytes and then taking the modulus, as in sum(bytes) % 256 (or sum(bytes) & 0xFF), is (in many programming languages) vulnerable to integer overflow, since there is a finite maximum value that integer types can represent.

But, since we are talking about Python, this is not technically an issue: Python integers are arbitrary-precision, so an integer overflow can't occur.

If you want to perform the modulus operation on an element-by-element basis, you can use functools.reduce():

>>> payload = [100, 101, 102, 103, 104] # arbitrary sequence of bytes
# (Python 3 uses functools.reduce() instead of builtin reduce() function)
>>> import functools
>>> functools.reduce(lambda x,y: (x+y)%256, payload)
254
Colin D Bennett
  • 11,294
  • 5
  • 49
  • 66