Let's say I have list with 100 items called mylist
.
I have function that performs certain operation with each element of the list.
I order to speed things up I want define n
parallel tasks.
Each task should take 100/n
list items and process them.
I understand all about executors and core settings which I need to change to enable parallelism, I need some basic pointer on how to set tasks right.
My idea is to build it like this:
imports...
mylist = [item0, item2,...,item99]
n=5
def myfunction(sub_list):
"""This is a function that will run within the DAG execution"""
"""Procesing list elements"""
# Generate 5 tasks
for i in range(1,len(lst), n):
task = PythonOperator(
task_id='myfunction_' + str(i),
python_callable=myfunction,
op_kwargs={'sub_list': mylist[i:i + n]},
dag=dag,
)
task
I've assebmled this algorithm according to documentation. Is this proper way of doing this?