I am getting "Failed with exception java.io.IOException:/user/hive/warehouse/people/part-r-00001.parquet not a SequenceFile

Question

I created a Spark SQL table by calling .saveAsTable on my dataframe. That command succeeded completely. However, now when I query the table, the parquet files seem corrupt. I'm seeing this error:

"Failed with exception java.io.IOException:java.io.IOException: hdfs://ip:8020/user/hive/warehouse/people/part-r-00001.parquet not a SequenceFile"

below steps I have followed in spark-shell

scala >val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala>val path="test.json"
scala>val people = sqlContext.jsonFile(path)
scala> people.saveAsTable("people")

after that I have opened hive command prompt

hive> select * from people;
OK Failed with exception java.io.IOException:java.io.IOException: hdfs://IP:8020/user/hive/warehouse/people/part-r-00001.parquet not a SequenceFile Time taken: 0.276 seconds

How can I get my hive table(people) result.

Please let me know anything, nay change configuration wise.

How can I resolve above exception.

Thanks in advance.

try setting `spark.sql.hive.convertMetastoreParquet` to false — Sebastian Piu, Jan 19 '16 at 19:36
Hi Sebastian, Thanks for reply.I have done required change based on your suggestion i.e I have add spark.sql.hive.convertMetastoreParquet to false in my "spark-defaults.conf" like (spark.sql.hive.convertMetastoreParquet false) and then I have restarted my cluster. But still i am getting same error. can you please help on this. — Sai, Jan 20 '16 at 07:28

score 1 · Answer 1 · answered Sep 13 '17 at 14:46

This may relate to https://issues.apache.org/jira/browse/SPARK-14927 .

Seems saveAsTable will create a Hive table with spark-specific format. If you can see some messages like

Persisting partitioned data source relation `XX Table` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. Input path(s)

then the spark-specific format probably is the cause.

You can create the hive table first with sqlContext.sql('create table XXX'). And then put your data in HDFS with df.write.save.

Also see this question , this and this blog

score 0 · Answer 2 · answered Jan 20 '16 at 09:26

0

Tables created with saveAsTable won't work from hive if Hive and Spark are using different Parquet SerDe versions, you can try using a different serialisation method

E.g.:

df.write().format("orc").saveAsTable("table") or df.write().format("json").saveAsTable("table")

answered Jan 20 '16 at 09:26

Sebastian Piu

7,838
1
32
50

Hi Sebastian, thanks for reply .I have tried above solution it is not working. – Sai Jan 21 '16 at 05:04

I am getting "Failed with exception java.io.IOException:/user/hive/warehouse/people/part-r-00001.parquet not a SequenceFile

2 Answers2