5

I have two python scripts:object_generator.py which pickles a given object and prints it. Another script object_consumer.py picks the output of the first script through a subprocess.communicate and tries to unpickle it using pickle.loads. I am having trouble making this simple scenario work. This is my code:

object_generator.py

import pickle
import base64

o = {'first':1,'second':2,'third':3,'ls':[1,2,3]}
d = pickle.dumps(o)
print(d)

#Various Approaches I had tried, none of which worked. Ignore this part.
#s = base64.b64decode(d)
#encoded_str = str(d).encode('ascii')
#print('encoded str is :')
#print(encoded_str)
#decoded_str = encoded_str.decode('ascii')
#print('decoded str is :')
#print(decoded_str)
#unpickled_obj = pickle.loads(bytes(decoded_str))
#print(unpickled_obj)
#print(type(d))
#print(codecs.decode(d))

object_consumer.py

import pickle
import subprocess
import os

dr = '"' + os.path.dirname(os.path.abspath(__file__)) + '\\object_generator.py"'

cmd = 'python -u ' + dr

proc = subprocess.Popen(cmd,stdout=subprocess.PIPE)

try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

# 'out' at this point is something like this : 
# b"b'\\x80\\x03}q\......x05K\\x03u.'\r\n"
# DO SOMETHING WITH outs to get back the bytes which can then be 
# unpickled using pickle.loads

obj = pickle.loads(outs)
print(obj)

Clearly, I need to strip off the trailing \r\n which is easy but what should be done next?

Tapan Nallan
  • 1,762
  • 3
  • 17
  • 37
  • Does your effort aim at having the scenario work using th very given `subprocess` module method `.communicate()`, or are you open to implement a python process-to-process communication solution, be it using a `subprocess` or other means? – user3666197 Oct 06 '14 at 15:17
  • I tried subprocess.check_ouput and then moved to communicate when that didnt work but I have no restriction to use communicate. I would appreciate any alternate method along with a explanation of what is wrong in this. – Tapan Nallan Oct 06 '14 at 15:23
  • Always good to separate an issue of the process-to-process communication from the issues of payload representation/coding/serialisation/encapsulation/framing. For the former, may be usefull to get an overview on fast, scale-able process-to-process messaging **ZeroMQ** ( with many ports, incl. python ) where you get all the powers under your control, so being able to fit the communication archetypes into your distributed parallel processing Project needs. – user3666197 Oct 06 '14 at 16:35

2 Answers2

4

There are a couple of issues going on here. First, you're printing a bytes object in object_generator.py. In Python 3.x, that's going to result in str(obj) being called, which means b'yourbyteshere' gets printed. You don't want the leading b' or trailing '. To fix that, you need to encode the bytes object as a string. pickle uses the 'latin-1' encoding, so we can use that to decode the bytes object to a str. The other issue is that the encoding Windows uses by default for sys.stdout doesn't actually support printing decoded pickle strings. So, we need to change the default encoding for sys.stdout* to 'latin-1', so the string will make it to the parent process with the correct encoding.

import pickle
import base64
import codecs

o = {'first':1,'second':2,'third':3,'ls':[1,2,3]}
d = pickle.dumps(o)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='latin-1')
print(d.decode('latin-1'), end='', flush=True)  # end='' will remove that extra \r\n

Make those changes, and it should work fine.

Edit:

Another option would be to set the PYTHONIOENCODING environment variable to 'latin-1' from the parent process:

env = os.environ.copy()
env['PYTHONIOENCODING'] = 'latin-1'
proc = subprocess.Popen(['python3', 'async2.py'] ,stdout=subprocess.PIPE, env=env)

* See this question for more info on changing the sys.stdout encoding in Python 3. Both approaches I show here are mentioned there.

Community
  • 1
  • 1
dano
  • 91,354
  • 19
  • 222
  • 219
  • It worked. I see you retracted your answer and re-posted with the fix. Appreciate your patience and the trouble taken. – Tapan Nallan Oct 06 '14 at 16:30
  • 2
    @rusticbit Yes, my original answer worked fine on Linux, but didn't work on Windows. I forgot about the difference in the default encoding `sys.stdout` uses on Windows. – dano Oct 06 '14 at 16:37
  • 1
    @rusticbit I've updated my answer to use the `io` module instead of `codecs`, since the `io` module [should generally be preferred over the `codecs` module](https://mail.python.org/pipermail/python-list/2010-December/593460.html). I also added a way to solve the problem from the parent script. – dano Oct 06 '14 at 16:48
1

i don't suggest you using pickle between your main file and an unknow external one since it require the original classes to be live and it's also slow.

I used marshall module, hope this will save you time: https://github.com/jstar88/pyCommunicator

user2054758
  • 321
  • 3
  • 18