I am trying to read in a large file as pandas datastructure into a process using multiprocessing module. In my code that I have attached below, I successfully get done with the read_file function because "2" gets printed out. But then the python command window gets stuck at p1.join() as "3" never gets printed.
I read that multiprocessing process has a size limit to it, if that is the reason my file isn't getting thru, can anyone suggest an alternative to reading a large panda structure as a separate process?
In the very end, I hope to read two large panda structures simultaneously and concatenate them in the main function to halve the script time.
import pandas as pd
from multiprocessing import Process, Queue
def read_file(numbers,retrns):
Product_Master_XLSX = pd.read_excel(r'G:\PRODUCT MASTER.xlsx',sheetname='Table')
retrns.put(Product_Master_XLSX)
print "2"
if __name__ == "__main__":
arr = [1]
queue1 = Queue()
p1 = Process(target=read_file, args=(arr,queue1))
p1.start()
print "1"
p1.join()
print "3"
print queue1.get()