3

I have 10000s custom (compiled to '.so') modules that I'd like to use in python. The usage of the modules will be consequential (modules are used one after the other; not at the same time).

Normally, the code would look something like this:

# list with all the paths to all modules  
listPathsToModules = [.....]
# loop through the list of all modules 
for i in xrange(listPathsToModules):
    # get the path to the currently processed module 
    pathToModule = listPathsToModules[i]
    # import the module
    import pathToModule
    # run a function in 'pathToModule' and get the results
    pathToModule.MyFunction( arg1, arg2, arg3 )

Running this, here is what I find:

the avg. time it takes to import one module: 0.0024625 [sec]

the avg. time it takes to run the module's function: 1.63727e-05 [sec]

meaning, it takes x100 more time to import the module than to run a function that is in it!

Is there anything that can be done to speed-up the time it takes to load a module in python? What steps would you take to optimize this situation given the need to load and run many (assume 10,000s) of modules?

user3262424
  • 7,223
  • 16
  • 54
  • 84
  • 3
    Is there really no way you could merge them? Importing a module implies disk access, so it's understandably several orders of magnitudes slower than calling a function. Maybe if you put all those modules on some sort of RAM drive, which could greatly speed up the access speeds. – Boaz Yaniv May 17 '11 at 02:46
  • @Boaz Yaniv: merging them would create (say) one huge '.so' file of size ~ 1-2[GB]. Will it really be faster to import a file of such size? – user3262424 May 17 '11 at 02:48
  • 1
    Oh, a RAM drive is definitely another good place to start. – ncoghlan May 17 '11 at 02:51
  • 1
    One *big* import wouldn't be great, but a few hundred medium sized ones would likely be an improvement over thousands of little ones. – ncoghlan May 17 '11 at 02:52

1 Answers1

3

I would first question whether import is really the technique you want to be using to access thousands of code fragments - the full import process is quite expensive, and loading (non-shared) dynamic modules at all isn't particularly cheap, either.

Second, the code as you have written it clearly isn't what you're actually doing. The import statement doesn't accept strings at runtime, you would have to be using importlib.import_module() or calling __import__() directly.

Finally, the first step in optimising this would be to ensure that the first directory on sys.path is the one that contains all these files. You may also want to run Python with the -vv flag to dial up the verbosity on the import attempts. Be warned that this is going to get very noisy if you're doing that many imports.

ncoghlan
  • 40,168
  • 10
  • 71
  • 80
  • @ncoghlan: thank you. you write that `import` is expensive. is there an alternative to `import`, if I want to be able to call a function that is in a file? – user3262424 May 17 '11 at 03:13
  • Without more specifics, it's hard to say. My question was really whether you could redesign the overall system such that you just had *one* function (or a few functions) and appropriately parameterise them to handle the ~10000 different tasks. Then your program could just read the parameter set from a file instead of having to load all these different modules. – ncoghlan May 17 '11 at 14:45
  • @ncoghlan: thank you. The question is, how will this 'central' function handle the ~10000 different modules? wouldn't it still need to import them in order to call the functions they store (there are `N modules`, each containing `one function`)? – user3262424 May 17 '11 at 15:10
  • No, that's the part I'm questioning. Where did those 10000 modules come from? What do they do? Are the functions actually completely different, or are they essentially the same algorithm with a few options tweaked? Can you process them *once* and transform them into a different, more efficient form for future reuse? Basically, I'm suggesting you may be focusing on a micro-optimisation, when the performance problem really lies in the overall architecture of the system. But, while it definitely sounds suspect, I don't have enough info to say that for sure. – ncoghlan May 18 '11 at 04:48
  • This isn't really offering any advice to solve the problem, it's just a commentary on redesigning the application to side-step the issue. – Xealot May 22 '11 at 16:52
  • 2
    What is "the problem" though? The problem isn't "I want to import all these modules" or even "I want to call all of these shared libraries", it is "I want to do whatever it is that this C level code is doing". It's important not to get stuck on micro-optimising a poor algorithm and lose sight of the actual task that is to be achieved. (The classic example being that the gain from switching from an O(N**2) algorithm to a O(N) one will dwarf anything that could be done to speed up the inner loop of the original algorithm). – ncoghlan May 23 '11 at 08:09