I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?
Code below:
df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')
I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?
Code below:
df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')
The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true
to the connection URL. (See Configuration Properties for Connector/J.)
My benchmark went from 3325 seconds to 42 seconds!