I am doing an experiment where the result is high dimensional structured. I use a MultiIndex
to represent the result object and use multiprocessing
to compute and fill it. The result set is quite large, which can be easily up to millions to billions of entries. If the result is 3D, I can let the function which does the computation return a df
and then combine them into a panel afterwards.
When the result object is 5D or higher, I found it not straight-forward and memory consuming to return the subset of result from each function performed in a single process. However, it does not work if I let each process write their result directly to the MultiIndex
global variable (the result) which had been created before the computation. The values of the result df
are all NaN
as it is been created.
Any suggestions are greatly appreciated!