10

Recently a question was posed regarding some Python code attempting to facilitate distributed computing through the use of pickled processes. Apparently, that functionality has historically been possible, but for security reasons the same functionality is disabled. On the second attempted at transmitting a function object through a socket, only the reference was transmitted. Correct me if I am wrong, but I do not believe this issue is related to Python's late binding. Given the presumption that process and thread objects can not be pickled, is there any way to transmit a callable object? We would like to avoid transmitting compressed source code for each job, as that would probably make the entire attempt pointless. Only the Python core library can be used for portability reasons.

motoku
  • 1,571
  • 1
  • 21
  • 49
  • 2
    Whenever you find yourself asking the question "X isn't allowed *for security reasons*; how can I work around this?", you should be prepared with a **detailed** explanation of why you **need** to work around it. Security is serious business. – Karl Knechtel Jun 04 '11 at 05:00
  • @Karl, That is an excellent point. Stating that a workaround is a need would be an exaggeration. I made the mistake of even suggesting the easy way out. I'll keep at it, starting with Artur's suggestion. – motoku Jun 04 '11 at 05:41
  • 1
    If a code is pure python, then it's equally as portable as the python core library. I have a pure python serializer that can pickle any callable… and it is used as the backbone for a pure python parallel and distributed computing library. It's portable… and can build networks of hierarchical parallel and distributed parallel maps and pipes. My point is, if a package is pure python, you shouldn't exclude that -- you just install it to your user area on the distributed cluster if it's not already installed. See the `dill` package, which is basically a collection of pickle `copy_reg` calls. – Mike McKerns Mar 03 '15 at 01:53

1 Answers1

14

You could marshal the bytecode and pickle the other function things:

import marshal
import pickle

marshaled_bytecode = marshal.dumps(your_function.func_code)
# In this process, other function things are lost, so they have to be sent separated.
pickled_name = pickle.dumps(your_function.func_name)
pickled_arguments = pickle.dumps(your_function.func_defaults)
pickled_closure = pickle.dumps(your_function.func_closure)
# Send the marshaled bytecode and the other function things through a socket (they are byte strings).
send_through_a_socket((marshaled_bytecode, pickled_name, pickled_arguments, pickled_closure))

In another python program:

import marshal
import pickle
import types

# Receive the marshaled bytecode and the other function things.
marshaled_bytecode, pickled_name, pickled_arguments, pickled_closure = receive_from_a_socket()
your_function = types.FunctionType(marshal.loads(marshaled_bytecode), globals(), pickle.loads(pickled_name), pickle.loads(pickled_arguments), pickle.loads(pickled_closure))

And any references to globals inside the function would have to be recreated in the script that receives the function.

In Python 3, the function attributes used are __code__, __name__, __defaults__ and __closure__.

Please do note that send_through_a_socket and receive_from_a_socket do not actually exist, and you should replace them by actual code that transmits data through sockets.

Artur Gaspar
  • 4,407
  • 1
  • 26
  • 28
  • 1
    [That](http://stackoverflow.com/questions/6212326/python-distributed-computing-with-error-edited) _seems_ to be partially functioning. – motoku Jun 04 '11 at 08:39