0

I have a python object foo that I want to serialize so I run:

with open('foo.pkl', 'wb') as file:
    pickle.dump(foo, file)

I then submit the serialized object to a microservice in another virtual environment, now the problem is, foo depends on a module bar so when the microservice deserialize the foo.pkl file it is hit with the following error:

ModuleNotFoundError: No module named 'bar'

This makes sense, pickles require the libraries to be present when deserializing. Now the problem is, it does not make sense for me to include a copy of bar in both microservices, as this introduces a duplicate code on my code base, so my question is, is there a way that I can serialize my object while including the library bar in it so I can transfer across microservices?

João Areias
  • 1,192
  • 11
  • 41
  • @martineau the code is fairly small so I wouldn't have a problem including that on the pickle file, I just have no idea how to do such a thing. The structure of the microservices that I have here is that `foo` is a ML model, service A is responsible for generating training the model while service B is responsible for serving it to the web. – João Areias Feb 02 '22 at 02:40
  • 1
    The basic idea would be to read the `bar` module's source code into a string and then `dump()` it before the `foo` object. The process would need to be reversed when deserializing. To reduce the amount of the data in the file you could compress the source code string with [`zlib.compress()`](https://docs.python.org/3/library/zlib.html#zlib.compress) function before saving it. – martineau Feb 02 '22 at 09:05
  • 1
    I'm the `dill` author. Building on what @martineau said, you may want to look at extracting source code from objects (including modules) as found in `dill` in `dill.source`. There's `dill.source.getimportable` when you are assuming anything importable will also be available, and `dill.source.getsource` when you always want to extract the source code. `dill.source` is actually used as the basis for a source-code driven parallel/distributed computing `multiprocessing` alternative called `ppft`. – Mike McKerns Feb 02 '22 at 17:59

0 Answers0