3

Here is my spark sql code, where I am trying to read a presto table based on this guide;  https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

 val df = spark.read
 .format("jdbc")
 .option("driver", "com.facebook.presto.jdbc.PrestoDriver")
 .option("url", "jdbc:presto://localhost:8889/mycatalog")
 .option("query", "select * from mydb.mytable limit 1")
 .option("user", "myuserid")
 .load()

  I am getting the following exception, unrecognized connection property 'url'

Exception in thread "main" java.sql.SQLException: Unrecognized connection property 'url'
at com.facebook.presto.jdbc.PrestoDriverUri.validateConnectionProperties(PrestoDriverUri.java:345)
at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:102)
at com.facebook.presto.jdbc.PrestoDriverUri.<init>(PrestoDriverUri.java:92)
at com.facebook.presto.jdbc.PrestoDriver.connect(PrestoDriver.java:87)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:341)

Seems like this issue is related to https://github.com/prestodb/presto/issues/9254  where the property url is not a recognized property in Presto and looks like the fix needs to be done on the Spark side? Are there any other workaround for this issue?

PS:

Spark Version: 3.1.1
presto-jdbc version: 0.245 
Mohana B C
  • 5,021
  • 1
  • 9
  • 28
Raj
  • 2,368
  • 6
  • 34
  • 52

3 Answers3

3

looks like a spark bug fixed 3.3

https://issues.apache.org/jira/browse/SPARK-36163

odonnry
  • 189
  • 1
  • 13
2

There is no issue with spark or presto JDBC driver. I don't think URL which you specified will work.

You should change that to below format.

jdbc:presto://localhost:8889/mycatalog

UPDATE

Not sure how it's working with spark version < 3. As an workaround you can use another jar where strict config check has been removed as specified here.

Mohana B C
  • 5,021
  • 1
  • 9
  • 28
  • Sorry, that was a typo. I have updated the post. I indeed gave `jdbc:presto://localhost:8889/mycatalog` – Raj Aug 30 '21 at 20:48
  • I'm not getting any issues in spark 2.4.5 with presto-jdbc-0.260 – Mohana B C Aug 30 '21 at 20:50
  • Updated the post. I am using `spark 3.1.1` + `presto-jdbc-0.245` on `EMR 6.3.0`. Also `presto-jdbc-0.260` also has the same issue on `spark 3.1.1` – Raj Aug 30 '21 at 21:09
  • Try with another jar. Updated the answer. – Mohana B C Aug 30 '21 at 22:32
  • The amended jar does work. Thank you! although seems like do not support ARRAY `Exception in thread "main" java.sql.SQLException: Unsupported type ARRAYatorg.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getCatalystType(JdbcUtils.scala:247)`. That would be for a separate question. B/w here is the JIRA I have opened up for this issue https://issues.apache.org/jira/projects/SPARK/issues/SPARK-36616?filter=allopenissues – Raj Aug 31 '21 at 14:37
1

@odonnry is correct that the issue was fixed in spark 3.3.x, but if anyone cannot upgrade to Spark 3.3.x and is trying to use Trino, I created a workaround below according to the Jira issue linked by @Mohana

https://github.com/amitferman/trino

Amit Ferman
  • 39
  • 1
  • 2