I have a table which have Start and EndDate as columns. I want to partition the data month wise and run the algorithm on each month partition.
Currently, I am filtering the DataFrame using date (StartDtae and EndDate) and running algorithm for each month sequentially. For example for Jan and than feb, march and so on. We are not able to reap the benefits of SPARK parallelism by running algorithm sequentially for each month
I want to run the algorithm for each month in parallel for Jan, Feb, March....to take the advantage of parallelism of Spark.
To add more information to the question, I am running the algorithm ( which has set of steps A, B, C,D ) sequentially for each month say in a look. I want to run them in concurrently.
Please advice. How do we execute the algorithm in parallel for each month?