I have a python (3.8) program with quite a lot of imports. It is intended to run on a HPC cluster (to which I have no admin privileges) where most of the filesystems are loaded over NFS.
On my personal machine my program takes between 3-4 s to start-up, do all it's imports
, print some version information and exit. This is a bit slower than I like but totally fine for now. On the HPC server I can get similar start-up speeds when quiet, but when under heavy load the start-up time is catastrophic, taking multiple minutes.
From profiling, all the start-up time is taken up by imports
, and I suspect the culprit on the server is slow NFS (this is a bit of guess, but it matches the evidence). My question therefore is how can I work around the slow import speed caused by the NFS?
My thinking was to cache the imports to something like /tmp
(which is a local harddrive). You can do this by setting PYTHONPYCACHEPREFIX
to store compiled pyc files, which works great but only reduces the import speed on average by 10-20 s (total is still 1-2 mins). I think the bottleneck now is in checking the timestamp of the source files (which is still done over NFS), before the fast access to the local pyc, but could be wrong. Is there some mechanism by which the source files themselves could be cached in the same way? Note it's not just the source files for my project, which are easy to manage, but also those of all the other imports which are needed.
Appreciate the help.