0

I am using Spark JDBC to connect to MySQL table. When it reads the table, the schema contains all columns as nullable. Whereas primary keys should have nullable false. I am using MySQL 5.1.8 version driver.

I am using

session.read.jdbc(s"${destOptions.getProperty("connection_string")}?useCompression=true&useSSL=false&autoReconnect=true", config.srcTable,andLogicPredicate, destOptions).selectExpr(primaryKeyArray: _*)
guru107
  • 1,053
  • 1
  • 11
  • 28
  • Could you post the code you're using to load data and infer schema? – Andronicus Jan 16 '19 at 06:45
  • Have you tried this [solution](https://stackoverflow.com/questions/33193958/change-nullable-property-of-column-in-spark-dataframe) for changing schema? – Andronicus Jan 16 '19 at 07:07
  • Yes, I have tried that, but that particular solution won't help here. I have a use case where I am reading a source table to get all primary keys in a time range and forming a predicate query to fire on the destination table in another database cluster. Now since this is a destination table read. The query that spark is forming is `select * from destTable where (composite_primary_key1 IS NOT NULL AND composite_primary_key2 IS NOT NULL) AND (composite_primary_key1='ABCD' AND composite_primary_key2=123)` which results in entire table scan. – guru107 Jan 16 '19 at 07:16

0 Answers0