My use case is to write to DynamoDB from a Spark application. As I have limited write capacity for DynamoDB and do not want to increase it for cost implications, how can I limit the Spark application to write at a regulated speed?
Can this be achieved by reducing the partitions to 1 and then executing foreachPartition()
?
I already have auto-scaling enabled but don't want to increase it any further.
Please suggest other ways of handling this.
EDIT: This needs to be achieved when the Spark application is running on a multi-node EMR cluster.