Low JDBC write speed from Spark to MySQL

Question

I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?

Code below:

df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')

Spark since 1.6.0 [supports batch inserts](https://issues.apache.org/jira/browse/SPARK-10040), so if you use older version - upgrade. If you can't upgrade for some reason, get RDD from your DataFrame and do batch insert by hand in foreachPartition loop. — Vitalii Kotliarenko, Apr 28 '16 at 12:57
I'm using Spark 1.6.1 but the write speed is about 100 records/s. Can Dataframe repartition helpful here? — Takashi Lee, Apr 28 '16 at 17:24
The default batch size is 1000. Maybe increasing it would speed things up? I'm giving that a try now! (You can set it through the `batchsize` property.) — Daniel Darabos, Jul 18 '16 at 10:17
A more typical example would be to load the DataFrame directly via `spark.read` or use `RDD.toDF`. — Daniel Darabos, Dec 14 '16 at 23:02

score 18 · Accepted Answer · edited May 23 '17 at 11:54

18

The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true to the connection URL. (See Configuration Properties for Connector/J.)

My benchmark went from 3325 seconds to 42 seconds!

edited May 23 '17 at 11:54

Community

answered Jul 18 '16 at 12:25

Daniel Darabos

1 Answers1