I have a JavaRDD that I need to persist to some external DB.
What would be the best way to do it so I don't suffocate my DB with enormous number of connections? That is - I would like to have control over the number of Connection pools created in my Spark app.
I believe that rdd.forEach
would be a bad option as it might create a connection pool for each row. I assume that rdd.foreachPartition
is probably better but not quite sure.