4

I have the following python class:

class Header:
  def __init__(self, id, len):
    self.id = id
    self.len = len

h = Header(1, 10)

How can I serialize/encode an instance of this class, h to bytes or bytearray, which can be written, for example to a socket?

To give a little more perspective, I need to write this object to an unix domain socket where a C++ program is listening to receive the above object (it defines the above struct exactly as above, with same number/type of fields). Encoding by pickle.dump(...) does not work.

The C++ struct is:

typedef struct Header {
  uint32_t id;
  uint32_t len;
}

In fact I am able to interface with this C++ program from Go, as follows.

import (
  "bytes"
  "encoding/binary"
)

type Header struct {
  ID uint2
  Len uint32
}

// output of this function is written to the socket opened by C++ and it works!!
func GetHeaderBuf() *bytes.Buffer, error{
  hdrBuf := new(bytes.Buffer)
  hdr := Header{1, 10}
  if err := binary.Write(hdrBuf, binary.LittleEndian, hdr); err != nil {
    return nil, err
  }
  return hdrBuf, nil
}

What I am looking for is the python equivalent of of the Go code line binary.Write(...) above.

Curious
  • 2,783
  • 3
  • 29
  • 45
  • Duplicate of https://stackoverflow.com/questions/64057498/python-serialize-object-and-decode-return-an-invalid-start-byte-error – Razzle Shazl Feb 23 '21 at 07:59
  • And socket question likely answerable here: https://stackoverflow.com/a/28519877/2359945 . I can see it's hard to sift through results, I had to poke into about 6 or so myself. Hope this helps you. In the future, please consider sharing some links that you have researched and how they fall short of you expectations. – Razzle Shazl Feb 23 '21 at 08:03
  • 1
    "it defines the above struct exactly as above, with same number/type of fields" – what struct and what types? You are showing an arbitrary class with arbitrary fields of arbitrary types. Note that if you are interested in representing C-structs as in "data layout", Python has a module literally named ``struct`` for that. – MisterMiyagi Feb 23 '21 at 08:16

2 Answers2

5

This is called Serialization.

In Python, you can use the standard library pickle module which performs automatically the (de-)serialization, or serialize by hand. In that latter case, you decide the individual attributes to encode and the way to encode them. Then the struct module does the actual byte conversion.

pickle way:

data = pickle.dumps(h)
h2 = pickle.loads(data)

manual way:

Lets say that we need 2 bytes to store an id (less then 65636) and 4 bytes to store a len. We could do

data = struct.pack('>hi', h.ID, h.Len)
h2 = Header(*struct.unpack('>hi', data))

Pickling uses an internal format and should only be used between Python application. On the other hand, struct is specialy suited for heterogeneous applications. Here the > says that the integer values should use the so called network order (big) endianness. This eases the process of exchanging values between different architectures.

If the other part uses C language, struct is with no doubt the way to go.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • What to do in the situation when the object is not `pickle` serializable? In other words, how to solve this problem mentioned here: https://stackoverflow.com/q/69430747/6907424 – hafiz031 Oct 06 '21 at 03:25
-2

Right. First of all, I want to point out that this:

     ID = 0                                                                                                  
     Len = 0

is completely useless. All it does is add two attributes on the class. And that aside from this you're not respecting Python naming conventions:

  • attributes are snake_cased
  • no shadowing builtins (which id and len are).

Anyway there are quite literally dozens of ways to encode stuff to bytes, so you will first need to decide on which exchange protocol you want to use, which will depend on what's on the other end of the socket.

The simplest solution from a UX perspective is pickle, serializing objects is the entire point of the thing and it'll work out of the box, however it's also the most dangerous (because pickles are a python-adjacent bytecode, so pickle input is essentially remote code execution), and it's 100% Python-only.

At the other end of the spectrum is json, which is incredibly well-supported by all sorts of languages, but knows nothing about Python so you'll need a customised encoder (and possibly a customised decoder on the other end if it's also Python).

There are also third-party libraries (e.g. messagepack, flatbuffers, ...) which have various tradeoffs, aside from not being supported directly by the stdlib.

Masklinn
  • 34,759
  • 3
  • 38
  • 57
  • Thanks for the comment regarding python convention; I had copied a struct from Go to quickly post the question. Edited it now. – Curious Feb 23 '21 at 08:09
  • If you're copying a struct from Go you likely need to interface / interact with Go, on the other end of the socket? In that case JSON and a customised encoder is most likely the way to go. – Masklinn Feb 23 '21 at 08:13
  • actually I am trying to interface with C++ from both Go (successfully done) and python (in progress). – Curious Feb 23 '21 at 08:38
  • 1
    Ah if you're doing that then you probably already have some sort of serialisation protocol in-place. In that case you may want to use the [`struct`](https://docs.python.org/3/library/struct.html#module-struct) module and serialise your instances completely by hand, if you're using an ad-hoc protocol between Go and C++. `struct` lets you do low-level conversions to binary data (controlling byte order, specific output size, ...) in C terms. – Masklinn Feb 23 '21 at 09:08