Convert SQL Databse to Parquet using Apache Spark

Question

Is it possible to convert entire database from SQL into Parquet format. Since writing schema for every table is time consuming are there any simple way to make it work on any database with latest version of Spark and Parquet on a cluster. Simple way of doing it per table I guess should be:

import org.apache.spark.sql.SQLContext
import java.util.HashMap
val sqlctx = new SQLContext(sc)
var options: HashMap[String, String] = new HashMap
val url_total = "jdbc:mysql://127.0.0.1:3306/DBNAME" + "?user=" + "USERNAME" + "&password=" + "PWD";
options.put("driver", "com.mysql.jdbc.Driver")
options.put("url", url_total)
options.put("dbtable", "test")
val df = sqlctx.load(source="jdbc", options)
df.toDF().saveAsParquetFile("file:///somefile.parquet")

Is there a reason you want to do this with Spark? I would probably use Sqoop + Hive to do this. But your approach should work. Also, you're saving the file locally, you might run into an OOM doing this depending on how big your database is. — Joe Widen, Mar 24 '16 at 22:25
I'm not much familiar with Sqoop and I'm trying do same with MongoDB as well and their is already connector which helps in doing this . Spark also helps in querying when there is incremental DB backup. — cryptickp, Mar 24 '16 at 23:55
Your solution should work above, are you running into an issue? — Joe Widen, Mar 25 '16 at 01:04
Although I need solution for entire DB to be save in Parquet format. Per table saving is causing _Task not serializable_, I've followed suggested approach as in [solution](http://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou) and this [example](https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/LoadSimpleJdbc.scala) but still getting same exception. — cryptickp, Mar 25 '16 at 16:27
I've not tried it but here's potential solution by @shashir [JDBC2PARQUET](https://github.com/shashir/jdbc2parquet) — cryptickp, Mar 27 '16 at 22:14

Convert SQL Databse to Parquet using Apache Spark

0 Answers0