how to read orc transaction hive table in spark?

Question

How to read orc transaction hive table in spark?

I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

See complete scenario :

hive> create table default.Hello(id int,name string) clustered by
(id) into 2 buckets STORED AS ORC TBLPROPERTIES
('transactional'='true');
   
hive> insert into default.hello values(10,'abc');

Now I am trying to access Hive Orc data from Spark sql but it show only schema

>spark.sql("select * from  hello").show()

Output: id,name

score 2 · Answer 1 · answered May 21 '19 at 07:24

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

score 1 · Answer 2 · answered Jul 10 '18 at 18:43

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

pbahr · Answer 3 · 2018-05-09T14:54:32.617

-1

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

edited May 09 '18 at 14:54

answered May 09 '18 at 14:49

pbahr

1,300
12
14

The question is about reading orc transactional tables from spark, the statement already has show() in it. – Abhishek Bansal Nov 29 '19 at 10:36
1

Looking at the last edit timestamp, I believe the original didn't. – pbahr Dec 13 '19 at 17:39

how to read orc transaction hive table in spark?

3 Answers3

Linked