Parallel Execution - Spark

Asked Nov 03 '18 at 05:11

Active Nov 03 '18 at 14:12

Viewed 64 times

I have 3 processes that are currently run in sequence

Write dataframe to S3 Bucket A
Write dataframe to S3 Bucket B
Write dataframe to database

final_df.write.mode('overwrite').parquet(S3 BUCKET A)
final_df.write.partitionBy("PART 1","PART 2").mode('append').parquet(S3 BUCKET B)
write_to_jdbc(logger, transformed_dataframe, jdbc_url,db_user_nm, db_user_pwd, 'test_table', 'append')

Is there a way to execute these in parallel ?

edited Nov 03 '18 at 14:12

10465355

4,481
2
20
44

asked Nov 03 '18 at 05:11

bunker

you can spawn multiple threads in your code & start each of these task in separate thread with same SparkSession – Vikas Nov 03 '18 at 06:39

Parallel Execution - Spark

0 Answers0