1

I'am trying to speedup a Python program , I made remark that there is a thread always running that scans the inputs from an external resource, and when it gets something, it will call another function that will parse the input data and return an understandable information (the parsing function also uses other functions).

A simple model of the scanning() function

def scanning(x):
    alpha = GetSomething(x)
    if alpha != 0:
        print Parsing(alpha)

So my idea is to convert this thread into a process that will run in parallel with the main process, and when it gets something, it will send it using a Queue to the main process which should then call the parsing function.

My questions are: it is possible to keep the scanning()function as it is and use it inside a process (even if it calls other functions)?

If not, what are the required modifications on the structure of the scanning() function to be used conveniently with the multiprocessing module?

What is the proper way to multiprocess a function that calls other functions in Python ?

werber bang
  • 173
  • 3
  • 11
  • I guess it would be like you said: a thread to populate a queue, some threads to scan whatever is in the queue – Whitefret Apr 05 '16 at 11:59
  • the option above is viable if the input are faster to get than the scanning part – Whitefret Apr 05 '16 at 12:03
  • @Whitefret I'am trying to replace threading with multiprocessing , but I wonder what's the proper way to do it since `scanning() ` has a lot of function calls within it , I would appreciate it if you can help – werber bang Apr 05 '16 at 14:48
  • I don't see why you would use multiprocessing except if you want to use several machine at the same time. In that case, I don't know a way to do that in plain python, but you could use MPI in C, RMI in java, or even why not Map/Reduce – Whitefret Apr 05 '16 at 16:33
  • @Whitefret the scanning thread is always running , so I want to profit from multiprocessing so it can run on a separate core not the same as the main program – werber bang Apr 05 '16 at 16:54
  • I might be wrong but multithreading makes you use every core, not just the one where the process is running. If you really want that kind of behavior, you need to look at distributed system programming: https://wiki.python.org/moin/DistributedProgramming – Whitefret Apr 05 '16 at 17:27

1 Answers1

2

Short answer: yes, it is possible.

To understand why, you need to understand one thing about multiprocessing. It does not remove multiprocessing-invoked function into a separate process: it creates a full replica of your entire process: including it's code, loaded modules and any global data that have been initialized before you forked your processes.

So if your code has some sub-functions defined, they will be available to your function after it's been split into a separate process, along with any data that have been pre-initialized. Any modifications to values, functions and namespaces of your main process after forking processes will not affect the forked process at all - you need to use special tools to communicate between processes.

So, let's suppose you have the following abstract code:

import SomeModule
define SomeFunction()
assign SomeValue

define ChildProcess():
    call SomeFunction()
    increase SomeValue
    do ChildProcessStuff

start ChildProcess()
decrease SomeValue
do MainProcessStuff

For both main and spawned processes, your code executes identically until the line start ChildProcess(). After this line your process splits into two which are fully identical at first, but have different points of execution. Main process goes past this line and proceeds straight to do MainProcessStuff, while your child process will never reach that line. Instead, it creates a replica of entire namespace and starts executing ChildProcess() as if it was called like a normal function followed by an exit().

Note how both main and child processes have access to SomeValue. Also note how their changes to it are independent, as they're doing them in different namespaces (and therefore to different SomeValues). This wouldn't be the case with threading module which does not split the namespaces, and it's an important distinction.

Also note that main process never executes the code in ChildProcess but it retains a reference to it, which can be used to track it's progress, terminate it prematurely etc.

You might also be interested in more in-depth information about Python threads and processes here.

Community
  • 1
  • 1
Lav
  • 2,204
  • 12
  • 23
  • if we take the example of the `scanning` process , it will be the a new copy of the original application that got it running in the first place ,so I can call the parsing function from within it. But how would I send the parsed data back to the main process ? – werber bang Apr 07 '16 at 13:22
  • @werberbang Typically, to communicate between processes you use [pipes and/or queues](https://docs.python.org/2.7/library/multiprocessing.html#pipes-and-queues). Create a pipe/queue instance before splitting processes, provide `Process` class with a reference to pipe/queue object (usually done by passing it to constructor) and once you start your child process, both processes will have access to pipe/queue and can read/write from it. – Lav Apr 08 '16 at 13:55
  • by doing that , will the main process still be active(performing other operations and interacting with the user) while the child process is performing the read and parse operations , or will it be in a wait state until reading something from the shared Queue – werber bang Apr 10 '16 at 12:15