First off, thanks to anyone that responds. SO has saved so much time in my programming projects. I appreciate it!
My code iterates through a huge dataframe. Here is an example code:
#len(df)=1,000,000
for i in range(1,len(df))
df.iloc[i,1]=df.iloc[i,1]*40
NOTE*: my code is doing something far more complicated. The question is the same but I was afraid to post lines and lines of code. Essentially, I want to know how to multi-process over portions of a dataframe using this as an example.
I want to split up the processes using multiprocessing
. I want a worker to do tasks 1-500,000 and the next worker to do 500,001-1,000,000.
Here is my thought:
from multiprocessing import Pool
def jobs(jobstart,jobend):
#len(df)=1,000,000
for i in range(jobstart,jobend):
df.iloc[i,1]=df.iloc[i,1]*40
if __name__ == '__main__':
p= multiprocessing.Pool(processes=2)
results=p.starmap(jobs, [(1,500000),(500001,1000000)])
p.close()
print(results)
How come this doesn't work? Error:
File "C:/Files for Local/fisher_scrapper/frappy.py.py", line 238, in <module>
results=p.starmap(jobs, [(0,50),(51,100)])
File "C:\Users\TLCLA\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 298, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "C:\Users\TLCLA\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 683, in get
raise self._value
JSONDecodeError: Expecting value