Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.
What should be done here is actually on the reading end.
Now in classical JDBC usage, jdbc connectors have a fetchSize
property for PreparedStatement
s. So basically you can consider configuring that fetchSize with regards of what is said in the following answers :
Unfortunately, this might not solve all of your performance issues with your RDBMS
.
What you must know is that compared to the basic jdbc reader, which run on a single worker, when partitioning data using an integer column or using a sequence of predicates, loading data in a distributed mode but introduce a couple of problems. In your case, high number of concurrent reads can easily throttle the database.
To deal with this, I suggest the following :
- If available, consider using specialized data sources over JDBC
connections.
- Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.
- Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.
- Consider using a separate replica for Spark jobs.
If you wish to know more about Reading data using the JDBC source, I suggest you read the following :
Disclaimer: I'm the co-author of that repo.