spark jdbc api can't use built-in function

Question

I want to get subquery from impala table as one dataset.

Code like this:

String subQuery = "(select to_timestamp(unix_timestamp(now())) as ts from my_table) t"
Dataset<Row> ds = spark.read().jdbc(myImpalaUrl, subQuery, prop);

But result is error:

Caused by: java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Timestamp.

I can use unix_timestamp function,but to_timestmap failed, why?

I found code in org.apache.spark.sql.execution.datasources.jdbc.JDBC.compute() exists some problem:

sqlText = s"SELECT $columnList FROM ${options.table} $myWhereClause"

$columList contains " like "col_name" , when I delete " it work fine.

https://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark might be useful — Norbert, Oct 24 '18 at 04:06

score 2 · Accepted Answer · answered Oct 24 '18 at 04:19

I solve this problem by add dialect, default dialect will add "" to column name,

 JdbcDialect ImpalaDialect = new JdbcDialect(){
        @Override
        public boolean canHandle(String url) {
            return url.startsWith("jdbc:impala") || url.contains("impala");
        }
        @Override
        public String quoteIdentifier(String colName) {
            return colName;
        }
    };

    JdbcDialects.registerDialect(ImpalaDialect);

spark jdbc api can't use built-in function

1 Answers1