Does column order matter in BIGSQL while creating external table over parquet files

Question

In my spark code I am writing my dataframe as a parquet file on hdfs. Then I have created an external table by changing columns order over those parquet files in BIGSQL and after querying the table, it shows me following error.

But If I query the same table in hive it works file. We get output by column to column mapping.

Does bigsql support column to column mapping while creating external table over parquet file?

BIGSQL error log: 

DB2 LOG: The statement failed because a Big SQL component encountered an error. Component receiving the error: "BigSQL IO". Component returning the error: "UNKNOWN". Log entry identifier: "[BSL-3-25371e740]".. SQLCODE=-5105, SQLSTATE=58040, DRIVER=4.22.36
Learn more about this error

BIG SQL LOG:    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedPrimitiveColumnReader.decodeDictionaryIds(DfsVectorizedPrimitiveColumnReader.java:506)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedPrimitiveColumnReader.readBatch(DfsVectorizedPrimitiveColumnReader.java:173)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedParquetRecordReader.nextBatch(DfsVectorizedParquetRecordReader.java:428)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedParquetRecordReader.next(DfsVectorizedParquetRecordReader.java:347)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsParquetSplit2Batch.split2Batch(DfsParquetSplit2Batch.java:98)
    at com.ibm.biginsights.bigsql.dfsrw.jaro.DfsSplitManager$SplitRunnable.run(DfsSplitManager.java:123)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.lang.Thread.run(Thread.java:785)
    at com.ibm.biginsights.bigsql.dfsrw.jaro.DfsSplit2BatchThread.run(DfsSplit2BatchThread.java:58)
2019-04-25 03:42:14,848 ERROR com.ibm.biginsights.bigsql.dfsrw.reader.DfsBaseReader [Master-3-S:5.1001.1.0.0.724] : [BSL-3-25371e740] Exception raised by Reader at node: 3 Scan ID: S:5.1001.1.0.0.724 Table: mc400.hutdnp_ext Spark: false VORC: false VPQ: true VAVRO: false VTEXT: false VRCFILE: false VANALYZE: false
Exception Label: UNMAPPED(java.lang.NullPointerException)
java.lang.NullPointerException
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedPrimitiveColumnReader.decodeDictionaryIds(DfsVectorizedPrimitiveColumnReader.java:506)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedPrimitiveColumnReader.readBatch(DfsVectorizedPrimitiveColumnReader.java:173)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedParquetRecordReader.nextBatch(DfsVectorizedParquetRecordReader.java:428)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsVectorizedParquetRecordReader.next(DfsVectorizedParquetRecordReader.java:347)
    at com.ibm.biginsights.bigsql.dfsrw.reader.parquet.DfsParquetSplit2Batch.split2Batch(DfsParquetSplit2Batch.java:98)
    at com.ibm.biginsights.bigsql.dfsrw.jaro.DfsSplitManager$SplitRunnable.run(DfsSplitManager.java:123)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)


SQL statement
select * from schema_name.dummytable
Execution log
Run time: 1.275 s
Status: FAILED
Database: HDP-DEV-BIGSQL

Do you mean column to column mapping works when you use create external table in hive while it doesn't in bigsql. I don't know bigsql but hive's default behavior according to this link is to map the table columns according to their order / index to the columns in the file. https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/external-table-stored-as-parquet-can-not-use-field-inside-a/td-p/36012 — Hichem BOUSSETTA, Apr 25 '19 at 23:23
I think you got my problem.So I just want to know does bigsql have same kind of property like "parquet.column.index.access=false" or it will not be possible. — Akash Janjal, Apr 26 '19 at 12:46

Does column order matter in BIGSQL while creating external table over parquet files

0 Answers0