I am trying to find the correct syntax for using a for loop with dask delayed. I have found several tutorials and other questions but none fit my condition, which is extremely basic.
First, is this the correct way to run a for-loop in parallel?
%%time
list_names=['a','b','c','d']
keep_return=[]
@delayed
def loop_dummy(target):
for i in range (1000000000):
pass
print('passed value is:'+target)
return(1)
for i in list_names:
c=loop_dummy(i)
keep_return.append(c)
total = delayed(sum)(keep_return)
total.compute()
This produced
passed value is:a
passed value is:b
passed value is:c
passed value is:d
Wall time: 1min 53s
If I run this in serial,
%%time
list_names=['a','b','c','d']
keep_return=[]
def loop_dummy(target):
for i in range (1000000000):
pass
print('passed value is:'+target)
return(1)
for i in list_names:
c=loop_dummy(i)
keep_return.append(c)
it is actually faster.
passed value is:a
passed value is:b
passed value is:c
passed value is:d
Wall time: 1min 49s
I have seen examples where it was stated there is a small amount of overhead for Dask, but this seems to take long enough to justify, no?
My actual for loop involves heavier computation where I build a model for various targets.