code.py:
#!/usr/bin/env python3
import sys
import ctypes
def get_long_data(long_obj):
py_obj_header_size = sys.getsizeof(0)
number_size = sys.getsizeof(long_obj) - py_obj_header_size
number_address = id(long_obj) + py_obj_header_size
return number_address, number_size, long_obj < 0
def hex_repr(number, size=0):
format_base = "0x{{:0{:d}X}}".format(size)
if number < 0:
return ("-" + format_base).format(-number)
else:
return format_base.format(number)
def main():
numbers = [0x00,
0x01,
-0x01,
0xFF,
0xFFFF,
0x00FFFFFF,
0x12345678,
0x3FFFFFFF,
0x40000000,
0x1111111111
]
for number in numbers:
address, size, negative = get_long_data(number)
print("Number: {:s}".format(hex_repr(number, size), size, negative))
buf = ctypes.string_at(address, size)
print(" Address: {:s}, Size: {:d}, Negative: {:},\n Data: {:}".format(hex_repr(address, size=16), size, negative, buf))
print(" ({:d}).to_bytes(): {:}".format(number, number.to_bytes(size, sys.byteorder, signed=(number < 0))))
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Notes:
- get_long_data is the function that does the work (everything else it's just for display / test purposes)
- The address alone is kind of useless (if one wants to be able to reconstruct the number), that's why the size (in bytes), and the sign of the number are returned as well
The code relies on [Python 3]: PyLongObject's structure (most of int functionality is located in [GitHub]: python/cpython - (master) cpython/Objects/longobject.c). Below it's its definition:
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
- The array at the end holds the actual number value (that's why numbers in Python can get so big)
- For 0,
sys.getsizeof
only returns PyObject_VAR_HEAD's size, that's used to get the array offset inside the structure
- [Python 3]: int.to_bytes(length, byteorder, *, signed=False) is used for verification, but note it will match our output only if:
0 <= n < 2 ** 30
(the method does some processing on the array contents, it doesn't directly store the raw data into the returned byte stream)
- It's visible that the bytes are (4 byte) reversed in the output buffer (0x12345678 is the most eloquent example), compared to the number's hex representation; that is because of little endianness (can check [SO]: Python struct.pack() behavior (@CristiFati's answer) for more details)
Output:
(py35x64_test) e:\Work\Dev\StackOverflow\q053657865>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
Number: 0x0
Address: 0x0000000074C55318, Size: 0, Negative: False,
Data: b''
(0).to_bytes(): b''
Number: 0x0001
Address: 0x0000000074C55338, Size: 4, Negative: False,
Data: b'\x01\x00\x00\x00'
(1).to_bytes(): b'\x01\x00\x00\x00'
Number: -0x0001
Address: 0x0000000074C552F8, Size: 4, Negative: True,
Data: b'\x01\x00\x00\x00'
(-1).to_bytes(): b'\xff\xff\xff\xff'
Number: 0x00FF
Address: 0x0000000074C572F8, Size: 4, Negative: False,
Data: b'\xff\x00\x00\x00'
(255).to_bytes(): b'\xff\x00\x00\x00'
Number: 0xFFFF
Address: 0x0000023286E3A6C8, Size: 4, Negative: False,
Data: b'\xff\xff\x00\x00'
(65535).to_bytes(): b'\xff\xff\x00\x00'
Number: 0xFFFFFF
Address: 0x0000023286C14FA8, Size: 4, Negative: False,
Data: b'\xff\xff\xff\x00'
(16777215).to_bytes(): b'\xff\xff\xff\x00'
Number: 0x12345678
Address: 0x0000023286DE4E88, Size: 4, Negative: False,
Data: b'xV4\x12'
(305419896).to_bytes(): b'xV4\x12'
Number: 0x3FFFFFFF
Address: 0x000002328710C128, Size: 4, Negative: False,
Data: b'\xff\xff\xff?'
(1073741823).to_bytes(): b'\xff\xff\xff?'
Number: 0x40000000
Address: 0x000002328710C108, Size: 8, Negative: False,
Data: b'\x00\x00\x00\x00\x01\x00\x00\x00'
(1073741824).to_bytes(): b'\x00\x00\x00@\x00\x00\x00\x00'
Number: 0x1111111111
Address: 0x000002328710C148, Size: 8, Negative: False,
Data: b'\x11\x11\x11\x11D\x00\x00\x00'
(73300775185).to_bytes(): b'\x11\x11\x11\x11\x11\x00\x00\x00'