I'm writing an python app that reads data from Cassandra, does some processing via Spark api, and then writes results to Mysql db via jdbc.
The whole code to process data via spark just takes several seconds but writing the last dataframe (with about 5000 rows) to mysql taking around 10 mins so I'm trying to figure out how to speed up that part.
This is the code that I'm using to write py spark dataframe to a mysql table:
df_name.write.jdbc(url=mysql_url.value,
table=tbl_name,
mode=mode.value,
properties={'user': mysql_user.value,
'password': mysql_pwd.value,
'driver': mysql_jdbc.value})
I allocated --executor-memory 4g --driver-memory 2g --total-executor-cores 6
to run my app.
The number of partitions for the data frame that I want to write to mysql is 5740; I followed several advises to reduce number of partitions (via .repartition() function) but it doesn't speed up the writing phase.
I also followed trick in [1] to set rewriteBatchedStatements=true
and useServerPrepStmts=false
but there is no improvement in performance yet.
Are there any settings or tricks I would try to improve speed when writing from dataframe to mysql via jdbc?
Thank you for your help.
[1] Low JDBC write speed from Spark to MySQL
P/S: Our Mysql db is in production, and we don't see performance issue in any other applications.