I am having trouble with reading a spark dataframe from a hive table. I stored the dataframe as:
dataframe.coalesce(n_files).write.option("mergeSchema", "true").mode("overwrite").parquet(table_path)
When I try to read this dataframe and do a .show()
on it, it breaks with the following error:
java.lang.UnsupportedOperationException: parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
at parquet.column.Dictionary.decodeToLong(Dictionary.java:52)
How I can I find which column is the root cause of this error? I tried to follow the answer here . But I am able to load the df perfectly fine reading the parquet files viz:
df = spark.read.option("mergeSchema", "True").parquet("/hdfs path to parquets")
- The said hive table is an external table. My guess is it has something to do with table properties? But what should I be looking at?
- I cannot use
saveAsTable
. I need to write directly to the path due to a certain requirement