Marshaling data between Python 2 and Python 3 scripts

Question

So, I have the following monstrosity: a python2 script that needs to use a bunch of modules that are only available for python2, and some functionality for which I need to use libraries that don't work with the (old) python2 version I'm using. So I figured I might as well use the latest python version to get that functionality implemented, i.e. python3. So what I do now (in the python2 script) is use

subprocess.call(["/path/to/python3", "python3_script.py", "argument", "more argument"])

and then in the python3 script, I do

some_variable = sys.argv[1]
other_variable = sys.argv[2]

It's not pretty but it works because until now I only needed to pass simple strings.

However now I need to send more complex and large data structures (essentially dicts of objects) and while I could theoretically strip out all methods of the objects, reimplement them as freestanding functions, manually serialize the dicts and de-serialize them on the python3 side, I'd like to use something more robust and less labor intensive.

As I see it I have two options - use a portable serialization method (but that won't let me use objects with methods) or find some way to share object definitions and data between a python2 and python3 instance.

So, say I have a module called Foo, and in it I define a class Foo with some methods, can I use that from a python2 and a python3 process running at the same time? More specifically, will the .pyc files that are generated differ and interfere with each other?

Secondly, is there a way (either in the language or a library) that will let me serialize a data structure in python2, pass it as a string to a python3 script, and let me then deserialize it correctly from the python3 script?

Martijn Pieters · Accepted Answer · 2018-04-08T10:51:55.470

You can safely share .py modules between Python 2 and Python 3, provided the code is compatible with both versions of the language. That's because Python 3 uses a different scheme to cache bytecode, where the files are minor-version specific; you'll have separate .pyc files for your Python 2 and Python 3 interpreters. See this Python programming FAQ entry:

When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a .pyc file containing the compiled code should be created in a __pycache__ subdirectory of the directory containing the .py file. The .pyc file will have a filename that starts with the same name as the .py file, and ends with .pyc, with a middle component that depends on the particular python binary that created it. (See PEP 3147 for details).

Even if that wasn't the case, the two interpreters could still run side by side because .pyc files are read into memory in one step and contain version information in them too and they'd just be replaced each time either interpreter imports a module with an incorrect cache file version.

You can use the pickle module to serialise and deserialise across Python versions, provided you pick a compatible protocol version and stick to a known codec for your Python 2 string data. See Unpickling a python 2 object with python 3 for details on the latter.

I support pickling. Your processes can communicate the pickled objects, the results or anything else over file descriptors (for example you could piggyback the STDIN and STDOUT of your new subprocess to communicate with the creating process) — polo, Apr 08 '18 at 10:14
No longer a problem for my specific case, just for clarity - "the two interpreters could still run side by side because .pyc files contain version information in them too and they'd just be replaced each time either interpreter imports a module with an incorrect cache file version."; would that still work when one process calls another and then continues to use the compiled structures after the second process returns? I guess the pyc is only read at the point the import is done, and after that it's all in memory, so the second process overwriting the pyc of the first doesn't matter any more? — Roel, Apr 08 '18 at 10:22
@Roel: exactly. `.pyc` files are loaded (using the `marshall` module) into code objects in memory, and the top-level code object is executed, producing a module namespace. The file on disk is not needed again past this point. — Martijn Pieters, Apr 08 '18 at 10:29

Marshaling data between Python 2 and Python 3 scripts

1 Answers1