Is there any service in EMR or way where I can see a progress bar(or elapsed time) when I submit a job of creating parquet files to S3?
The code:
df.write.partitionBy("date").mode("append").parquet("s3n://uk-adp-vault/semasio/output")
Is there any service in EMR or way where I can see a progress bar(or elapsed time) when I submit a job of creating parquet files to S3?
The code:
df.write.partitionBy("date").mode("append").parquet("s3n://uk-adp-vault/semasio/output")
You can go to the ResourceManager using the 8088 port on EMR. This will show you the memory utilization.
From there you can navigate to ApplicationMaster which is the spark UI for the cluster. That will show you the progress of that job with details of each task.