I am trying to read a big table to spark(~100M rows). The table is in PostGreSQL and we are reading as follows:
val connectionProperties = new Properties()
connectionProperties.put("user", "$USER")
connectionProperties.put("password", "$PASSWORD")
// val connection = DriverManager.getConnection("$SERVER", "$USER", "$PASSWORD")
//connection.isClosed()
val jdbc_url = s"jdbc:postgresql://${"$HOST"}:${$PORT}/${"$DB"}"
val df = spark.read.option("inferSchema", true).jdbc(jdbc_url, "$TABLE", connectionProperties)
But our SQL table has 2 columns as money datatype(in format $100,000.23). When reading in spark, it gets converted into double and throws exception.
We have tried doing : a) Casting column data to Double. But that is not helping as Spark also automatically casts to double. It is having problem with , in values. (used b) The data is already in DataFrame with commas. Trying to use PostgreSQL dialects(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala)
Any help will be really appreciated.