4

I want to extract data from a file whoose information is stored in big-endian and always unsigned. How does the "cast" from unsigned int to int affect the actual decimal value? Am I correct that the most left bit decides about the whether the value is positive or negative?

I want to parse that file-format with python, and reading and unsigned value is easy:

def toU32(bits):
    return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8  | ord(bits[3])

but how would the corresponding toS32 function look like?


Thanks for the info about the struct-module. But I am still interested in the solution about my actual question.

zvrba
  • 24,186
  • 3
  • 55
  • 65
Niklas R
  • 16,299
  • 28
  • 108
  • 203
  • I think you want the struct module. – jterrace Feb 09 '12 at 18:12
  • Used it before, but that didn't deliver the correct results. Uhm, wait, I guess I need to tell it what endianness the byte-stream has? xD – Niklas R Feb 09 '12 at 18:14
  • @jterrace Man, didn't know I can tell it about the endiannes.. Thanks jterrace. However, I would still be interested in the solution. :) – Niklas R Feb 09 '12 at 18:16
  • The Most significant bit does define the sign of the number. I know what you mean by "Left Most", but it can get confusing when talking about big endian and little endian. – grieve Feb 09 '12 at 18:25

3 Answers3

9

I would use struct.

import struct

def toU32(bits):
    return struct.unpack_from(">I", bits)[0]

def toS32(bits):
    return struct.unpack_from(">i", bits)[0]

The format string, ">I", means read a big endian, ">", unsigned integer, "I", from the string bits. For signed integers you can use ">i".

EDIT

Had to look at another StackOverflow answer to remember how to "convert" a signed integer from an unsigned integer in python. Though it is less of a conversion and more of reinterpreting the bits.

import struct

def toU32(bits):
        return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8  | ord(bits[3])

def toS32(bits):
    candidate = toU32(bits);
    if (candidate >> 31): # is the sign bit set?
        return (-0x80000000 + (candidate & 0x7fffffff)) # "cast" it to signed
    return candidate


for x in range(-5,5):
    bits = struct.pack(">i", x)
    print toU32(bits)
    print toS32(bits)
Community
  • 1
  • 1
grieve
  • 13,220
  • 10
  • 49
  • 61
  • Thanks for the information, didn't know that. :) However, I'd still be interested in the hard-coded version. – Niklas R Feb 09 '12 at 18:45
0

The non-conditional version of toS32(bits) could be something like:

def toS32(bits):
    decoded = toU32(bits)
    return -(decoded & 0x80000000) + (decoded & 0x7fffffff)

You can pre-compute the mask for any other bit size too of course.

Pablo Sole
  • 138
  • 6
0

I would use the struct module's pack and unpack methods.

See Endianness of integers in Python for some examples.

Community
  • 1
  • 1
Nick Haddad
  • 8,767
  • 3
  • 34
  • 38