2

I have a hash function in Python. It returns a value.

How do I see the byte-size of this return value? I want to know if it is 4-bytes or 8 or what.

Reason:

  • I want to make sure that the min value is 0 and the max value is 2**32, otherwise my calculations are incorrect.
  • I want to make sure that packing it to a I struct (unsigned int) is correct.

More specifically, I am calling murmur.string_hash(`x`). I want to know sanity-check that I am getting a 4-byte unsigned return value. If I have a value of a different size, them my calculations get messed up. So I want to sanity check it.

Joseph Turian
  • 15,430
  • 14
  • 47
  • 62
  • Not an *exact* duplicate, but see here: [How do I determine the size of an object in python](http://stackoverflow.com/questions/449560/how-do-i-determine-the-size-of-an-object-in-python) – Randolpho Jun 25 '10 at 19:34
  • 1
    Short answer: [sys.getsizeof](http://docs.python.org/library/sys.html#sys.getsizeof) – Randolpho Jun 25 '10 at 19:35
  • Does that mean you simply want to check that `0 <= value < 2**32`? Or whether you can pack it as such (just `try`)? – Jochen Ritzel Jun 25 '10 at 21:17
  • This seems to be impossible. For 2.x Python you can check whether the type is `int`, then the byte count can be extracted from `sys.maxint`. With `long` values or Python 3.x you have no chance, Python simply doesn't expose this information. I think that if you really require integer types with well-known sizes (and in most cases you don't), then standard Python types are the wrong choice. On the other hand, `numpy.int32` is always 32 bits wide. – Philipp Jun 25 '10 at 21:24
  • You can easily force value to be less than 2^32 during calculations. There is bitwise AND operations ( a & 0xffffffff), etc. Just make sure you don't turn it into float by accident. – SigTerm Jun 25 '10 at 21:59

2 Answers2

1

If it's an arbitrary function that returns a number, there are only 4 standard types of numbers in Python: small integers (C long, at least 32 bits), long integers ("unlimited" precision), floats (C double), and complex numbers.

If you are referring to the builtin hash, it returns a standard integer (C long):

 >>> hash(2**31)
 -2147483648

If you want different hashes, check out hashlib.

Nick T
  • 25,754
  • 12
  • 83
  • 121
1

Generally, thinking of a return value as a particular byte precision in Python is not the best way to go, especially with integers. For most intents and purposes, Python "short" integers are seamlessly integrated with "long" (unlimited) integers. Variables are promoted from the smaller to the larger type as necessary to hold the required value. Functions are not required to return any particular type (the same function could return different data types depending on the input, for example).

When a function is provided by a third-party package (as this one is), you can either just trust the documentation (which for Murmur indicates 4-byte ints as far as I can tell) or test the return value yourself before using it (whether by if, assert, or try, depending on your preference).

John Y
  • 14,123
  • 2
  • 48
  • 72