5

Say I'm having a following code in C:

union u_type
{
    uint32_t data;
    uint8_t  chunk[4];
} 32bitsdata;

32bitsdata.chunk[0] = some number;
32bitsdata.chunk[1] = some number;
32bitsdata.chunk[2] = some number;
32bitsdata.chunk[3] = some number;

printf("Data in 32 bits: %d\n", 32bitsdata.data);

How could I do similar thing in ython?

I'm trying to read a binary file (byte by byte) - already got it working, and combining every 3 bytes into one int. Heard struct would do the trick, but I'm not really sure how.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
shjnlee
  • 221
  • 2
  • 6
  • 20
  • 1
    You can just store four numbers inside four variables, Python doesn't require types... – Qwerp-Derp Jul 28 '17 at 23:36
  • 1
    That's not the point I'm asking. Union is convenient in this way because it could convert all 4 of the 8 bits to one 32 bits. I'm doing a project that needs to parse the data into bytes, so that's why I need this similar function in python. – shjnlee Jul 28 '17 at 23:39
  • 1
    But because there's no types, there's therefore no need for a union in Python. I mean, you can create your own class for this, but I don't see any reason why. – Qwerp-Derp Jul 28 '17 at 23:39
  • Take a look at the `struct` module – anthony sottile Jul 28 '17 at 23:39
  • 1
    There's no use for a `union` in Python. In Python, _data_ has a type, but _variables_ don't. This means that the variable-type / data-type mismatch that `union` is supposed to work around simply can't happen. – Kevin J. Chase Jul 28 '17 at 23:41
  • If you want to store _n_ bytes, there are standard data types for exactly that. [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes) is an immutable array of bytes, and [`bytearray`](https://docs.python.org/3/library/stdtypes.html#bytearray) is a mutable array of bytes. (See also: the Python [data model](https://docs.python.org/3/reference/datamodel.html).) – Kevin J. Chase Jul 28 '17 at 23:42
  • Thanks guys. Python is not my language, but I like to sometimes do python scripting for post-processing. Back to my question, I'm trying to read a binary file (byte by byte) - already got it working, and combining 3 bytes to one int. Heard struct would do the trick, but I'm not really sure how. – shjnlee Jul 28 '17 at 23:46
  • 1
    @shjnlee it's part of the standard library. It should be easy to grasp if you have a C background. Check out the [docs](https://docs.python.org/3/library/struct.html). You probably want to look at `struct.iter_unpack` – juanpa.arrivillaga Jul 28 '17 at 23:50
  • I could probably fix my example but I don't think it would fit OP's requirements adequately. – cs95 Jul 28 '17 at 23:57
  • I don't really know C except cursorily, don't really have a good graps of unions. What exactly do you want to do with your raw bytes? – juanpa.arrivillaga Jul 29 '17 at 00:00
  • @Qwerp-Derp: "Python doesn't require types" - How does this get together with Python being strongly typed? You confuse dynamic typing with "no typing"! – too honest for this site Jul 29 '17 at 01:52
  • XY-problem. The code is problematic in both languages. Use bitshifts/masking in both, C and Python! – too honest for this site Jul 29 '17 at 02:00
  • @toohonestforthissite - the code is problematic b/c alignment of union members is not guaranteed by the standard. One could use extensions like `packed` attribute. Then there is the problem of endianness. But, for a given architecture and a given (well documented) compiler, what other problem (if any) do you see in this code? – ysap Dec 16 '20 at 14:30

4 Answers4

13

What about ctypes?

from ctypes import (
        Union, Array, 
        c_uint8, c_uint32, 
        cdll, CDLL
) 

class uint8_array(Array):
    _type_ = c_uint8
    _length_ = 4

class u_type(Union):
    _fields_ = ("data", c_uint32), ("chunk", uint8_array)

# load printf function from Dynamic Linked Libary libc.so.6 (I'm using linux)
libc = CDLL(cdll.LoadLibrary('libc.so.6')._name)
printf = libc.printf

if __name__ == "__main__":
    # initialize union
    _32bitsdata = u_type()
    # set values to chunk
    _32bitsdata.chunk[:] = (1, 2, 3, 4)
    # and print it
    printf(b"Data in 32 bits: %d\n", _32bitsdata.data)
GIZ
  • 4,409
  • 1
  • 24
  • 43
Nick Tone
  • 182
  • 6
2

Here is what you would do. First, let's create the raw bytes we need, I'll cheat and use numpy:

>>> import numpy as np
>>> arr = np.array((8,4,2,4,8), dtype=np.uint32)
>>> arr
array([8, 4, 2, 4, 8], dtype=uint32)
>>> raw_bytes = arr.tobytes()
>>> raw_bytes
b'\x08\x00\x00\x00\x04\x00\x00\x00\x02\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00'

These could have easily been read from a file. Now, using the struct module is trivial. We use the unsigned int format character 'I':

>>> import struct
>>> list(struct.iter_unpack('I', raw_bytes))
[(8,), (4,), (2,), (4,), (8,)]

Note, each time we iterate we get back a tuple, since our struct has one member, it is a list of singleton tuples. But this is trivial to get into a flat python list:

>>> [t[0] for t in struct.iter_unpack('I', raw_bytes)]
[8, 4, 2, 4, 8]

Another alternative is to read them into an array.array:

>>> import array
>>> my_array = array.array('I', raw_bytes)
>>> my_array
array('I', [8, 4, 2, 4, 8])
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
1

If you're doing fancy numerical manipulation, you'd probably want to use the numpy library anyway, so consider the "view" method of numpy's ndarray type. The original ndarray can be viewed and modified via the view-array.

>>> import numpy as np
>>> a = np.uint32([1234567890])
>>> b = a.view(np.uint8)
>>> print(a)
[1234567890]
>>> print(b)
[210   2 150  73]
>>> b[2] = 10
>>> print(*b)
210 2 10 73
>>> print(*a)
1225392850
Dave Rove
  • 913
  • 1
  • 12
  • 18
0

You asked about C union, but if your objective is to group 3 bytes into an int, you could use Python struct.unpack instead.

import struct

chunk = bytearray()
chunk.append(0x00)   # some number
chunk.append(0xc0)   # some number
chunk.append(0xff)   # some number
chunk.append(0xee)   # some number

# Convert to a 32-bit unsigned int.
# You didn't specify the byte-order, so I'm using big-endian.
# If you want little-endian instead, replace the '>' symbol by '<'.
data = struct.unpack('>I', chunk)[0]  # unpack returns a tupple, but we only need the first value

print(hex(data))  # the terminal prints 0xc0ffee