I have a simple script in which a function does some calculation on a Pandas.Series
object which I want to parallel process. I have made the Pandas.Series
object as a shared memory object so that different processess can use it.
My code is given below.
from multiprocessing import shared_memory
import pandas as pd
import numpy as np
import multiprocessing
s = pd.Series(np.random.randn(50))
s = s.to_numpy()
# Create a shared memory variable shm which can be accessed by other processes
shm_s = shared_memory.SharedMemory(create=True, size=s.nbytes)
b = np.ndarray(s.shape, dtype=s.dtype, buffer=shm_s.buf)
b[:] = s[:]
# create a dictionary to store the results and which can be accessed after the processes works
mgr = multiprocessing.Manager()
pred_sales_all = mgr.dict()
forecast_period =1000
# my sudo function to run parallel process
def predict_model(b,model_list_str,forecast_period,pred_sales_all):
c = pd.Series(b)
temp_add = model_list_str + forecast_period
temp_series = c.add(model_list_str)
pred_sales_all[str(temp_add)] = temp_series
# parallel processing with shared memory
if __name__ == '__main__':
model_list = [1, 2, 3, 4]
all_process = []
for model_list_str in model_list:
# setup a process to run
process = multiprocessing.Process(target=predict_model, args=(b,model_list_str, forecast_period, pred_sales_all))
# start the process we need to join() them separately else they will finish execution before moving to next process
process.start()
# Append all process together
all_process.append(process)
# Finish execution of all process
for p in all_process:
p.join()
This code is working in ubuntu I checked. But when I run this in windows I am getting the following error.
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Also I tried the solution mentioned here in the stack-overflow question
What is wrong with the code and can someone solve the issue ? Is my parallelization code wrong ?