-2

I'm using coalesce(1) for writing the set of records in s3 bucket in csv process. which is taking too much time for 505 records.

dataset.coalesce(1).write().csv("s3a://bucketname/path");

And I want to mention that before this writing process, I'm having a encryption process which is changing value of some fields of the row of dataset. there i'm using repartion(200). As

dataset.javaRDD().repartition(200).map(r -> func());

if I'm skipping the encyption process, the writing process is not even taking single minute.
What is issue which is causing the process to slow down?
How can I increase the performance?

1 Answers1

0

Always avoid using coalesce(1) instead use partition by, i suppose the function which you are using to encrypt data is taking a lot of time as it has to iterate through all the records you could change it to flat map and check the preformance

Request you to check map and flat map

Welcome to the community please do accept the answer if useful.

Sundeep Pidugu
  • 2,377
  • 2
  • 21
  • 43