1

I have 3 processes that are currently run in sequence

  1. Write dataframe to S3 Bucket A
  2. Write dataframe to S3 Bucket B
  3. Write dataframe to database
final_df.write.mode('overwrite').parquet(S3 BUCKET A)
final_df.write.partitionBy("PART 1","PART 2").mode('append').parquet(S3 BUCKET B)
write_to_jdbc(logger, transformed_dataframe, jdbc_url,db_user_nm, db_user_pwd, 'test_table', 'append')

Is there a way to execute these in parallel ?

10465355
  • 4,481
  • 2
  • 20
  • 44
bunker
  • 99
  • 10
  • you can spawn multiple threads in your code & start each of these task in separate thread with same SparkSession – Vikas Nov 03 '18 at 06:39

0 Answers0