1

I'm playing with Spark connections to a local mysql instance.

I've got a mysql jdbc jar that i'm passing in:

pyspark --jars /path/to/jar

ANd I create my SQLContext, etc. And I start doing connection stuff, one version throws and error and ones does not.

SQLContext.read.jdbc(url="jdbc:mysql://localhost:3306?user=root", table="spark.words")

This throws a driver not found error.

SQLContext.read.format("jdbc").option("url","jdbc:mysql://localhost:3306?user=root").option("dbtable","spark.words").option("driver", 'com.mysql.jdbc.Driver').load()

This works as expected.

I thought the two were roughly the same and the former was a convenience method of the latter. What's the difference and why does the SQLContext.read.jdbc version error out?

zero323
  • 322,348
  • 103
  • 959
  • 935
Kristian
  • 21,204
  • 19
  • 101
  • 176
  • Methods should be roughly equivalent but these calls are not. You don't provide `driver` parameter in the first case. – zero323 Jun 15 '16 at 16:30
  • thanks for that, but https://spark.apache.org/docs/1.6.1/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader i literally don't know if one can even supply that param. when i try to include `driver="..."`, it throws an error saying "driver" is an unknown key – Kristian Jun 15 '16 at 16:44
  • [You can provide properties](https://stackoverflow.com/questions/30983982/how-to-use-jdbc-source-to-write-and-read-data-in-pyspark). – zero323 Jun 15 '16 at 16:45

1 Answers1

1

Generally speaking these two methods should be equivalent although there can be border cases where things don't work as expected (for example DataFrameWriter with JDBC source seems to express slightly different behaviors between format("jdbc") and jdbc(...)).

In this particular case the answer is simple though. These calls are not equivalent because the second solution is explicitly declaring driver class, while the first one is not.

If you want them to behave the same way you should provide properties dict:

sqlContext.read.jdbc(
    url=..., table=...,
    properties={"driver": "com.mysql.jdbc.Driver"})
zero323
  • 322,348
  • 103
  • 959
  • 935