How to change number of parallel tasks in pyspark ?
I mean how to change number of virtual maps that is run on my PC. actually I want to sketch Speed up chart by number of map functions.
sample code:
words = sc.parallelize(["scala","java","hadoop"])\
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
If you understand my purpose but I asked it in a wrong way I would appreciate if you correct it
Thanks