2

So i have a problem. I'm trying to make my imports faster, so i started using multiprocessing module to split a group of imports into two functions, and then run each on separate core, thus speeding the imports up. But now the code will not recognize the modules at all. What am I doing wrong ?

import multiprocessing


def core1():
    import wikipedia
    import subprocess
    import random
    return wikipedia, subprocess, random



def core2():
    from urllib import request
    import json
    import webbrowser
    return request, json, webbrowser


if __name__ == "__main__":
    start_core_1 = multiprocessing.Process(name='worker 1', target=core1, args = core2())
    start_core_2 = multiprocessing.Process(name='worker 2', target=core2, args = core1())
    start_core_1.start()
    start_core_2.start()

while True:
    user = input('[!] ')
    with request.urlopen('https://api.wit.ai/message?v=20160511&q=%s&access_token=Z55PIVTSSFOETKSBPWMNPE6YL6HVK4YP' % request.quote(user)) as wit_api:  # call to wit.ai api
        wit_api_html = wit_api.read()
        wit_api_html = wit_api_html.decode()
        wit_api_data = json.loads(wit_api_html)
    intent = wit_api_data['entities']['Intent'][0]['value']
    term = wit_api_data['entities']['search_term'][0]['value']
    if intent == 'info_on':
        with request.urlopen('https://kgsearch.googleapis.com/v1/entities:search?query=%s&key=AIzaSyCvgNV4G7mbnu01xai0f0k9NL2ito8vY6s&limit=1&indent=True' % term.replace(' ', '%20')) as response:
            google_knowledge_base_html = response.read()
            google_knowledge_base_html = google_knowledge_base_html.decode()
            google_knowledge_base_data = json.loads(google_knowledge_base_html)
            print(google_knowledge_base_data['itemListElement'][0]['result']['detailedDescription']['articleBody'])
    else:
        print('Something')
Reck
  • 1,388
  • 11
  • 20
user3657752
  • 83
  • 2
  • 11
  • What is the exact error/ issue you are facing? Try describing your problem in detail. – Reck Feb 25 '18 at 17:07
  • @Reck I am getting TypeError: can't pickle module objects error, you can see in the following image. http://prntscr.com/ijkm0x – user3657752 Feb 25 '18 at 17:25
  • This should have been mentioned/ added in the question. So that it would easier to understand the exact issue. – Reck Feb 25 '18 at 17:28
  • Yes i know,but i don't often use SO, so i forgot about it. Can you help me ? – user3657752 Feb 25 '18 at 17:29
  • 1
    By the error message I can infer that. Module instances which you are passing as args are fed to multiprocessing process. And these processes uses pickle to store the process dumps. And the catch here is that **pickle module** cannot pickle module objects. And please indent your code. – Reck Feb 25 '18 at 17:40
  • 1
    You may want to [check this](https://stackoverflow.com/questions/2790828/python-cant-pickle-module-objects-error) once. – Reck Feb 25 '18 at 17:41
  • wouldn't that only work after the module has been imported ? i want to speed up the imports. – user3657752 Feb 25 '18 at 18:04
  • Did you try without **args** argument in multiprocessing.Process? – Kishan Pradhan Feb 25 '18 at 18:29
  • Lets stepback and ask **why speeding up the imports**. You can try rearraging imports at the start and see if that works for you. And remember **KISS**. – Reck Feb 25 '18 at 18:29
  • And I doubt if this is working for you. As pointed by @Kishan I can see that you are passing target func as core with args which are return by core. And here core method do not accept any args. – Reck Feb 25 '18 at 18:31
  • then what other approach would you suggest to take that would result in decreasing startup time in general ? – user3657752 Feb 25 '18 at 18:36

1 Answers1

3

I think you are missing the important parts of the whole picture i.e. crucial parts of what you need to know about multiprocessing when using it.

Here are some crucial parts that you have to know and then you will understand why you can't just import modules in child process and speed up the thing. Even returning loaded modules is not a perfect answer too.

First, when you use multiprocess.Process a child process is forked (on Linux) or spawned (on Windows). I'll assume you are using Linux. In that case, every child process inherits every loaded module from parent (global state). When child process changes anything, like global variables or imports new modules, those stay just in its context. So, parent process is not aware of it. I believe part of this can also be of interest.

Second, module can be a set of classes, external lib bindings, functions, etc. and some of them quite probably can't be pickled, at least with pickle. Here is the list of what can be pickled in Python 2.7 and in Python 3.X. There are even libraries that give you 'more pickling power' like dill. However, I'm not sure pickling whole modules is a good idea at all, not to mention that you have slow imports and yet you want to serialize them and send them to parent process. Even if you manage to do it, it doesn't sound like a best approach.

Some of the ideas on how to change the perspective:

  1. Try to revise which module you need and why? Maybe you can use other modules that can give you similar functionalities. Maybe these modules are overweighing and bringing too much with them and cost is great in comparing to what you get.

  2. If you have slow loading of modules, try to make a script that will always be running, so you do not have to run it multiple times.

  3. If you really need those modules maybe you can separate their using in two processes and then each process does it's own thing. Example would be, one process parses page, other process processes and so on. That way you sped up the loading but you have to deal with passing messages between processes.

Ilija
  • 1,556
  • 1
  • 9
  • 12