I'm writing a large dataframe to a mysql database (Aurora on AWS RDS).
I'm doing roughly the following (pseudocode)
rdd1 = sc.textFile("/some/dir")
rdd2 = rdd.map(addSchema)
df = sqlContext.createDataFrame(rdd2)
df.write.jdbc(url="...", table="mydb.table", mode="append")
The dataframe is roughly 650,000 elements, and sometimes (yes, only sometimes) dies mid-insert, or at least I think that is what is happening.
In the stderr
, there is a line somewhere toward the bottom saying the app is exiting with status 1, error. But there isn't any error anywhere aside from that final bit.
Is this known to be an unreliable method for writing large sets of data to a mysql database? how can I save my large dataframe to a mysql db without it dying so frequently?
Edit: spark 2.0, emr 5.0