How can I share a huge DataFrame between many processes without duplicating it, for each time the process has been created?
Take a look at this code:
from functools import partial
from multiprocessing import Pool
from pandas import DataFrame
def work(task, df):
print(f'Working on task {task}, DataFrame located at {hex(id(df))}')
def main():
huge_df = DataFrame(...)
tasks = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
with Pool(processes=min(4, len(tasks))) as pool:
pool.map(partial(work, df=huge_df), tasks.items())
if __name__ == '__main__':
main()
To be clear, the huge_df is only needed for read-only operations.
Is there a better way to solve it?