4

How to read orc transaction hive table in spark?

I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

See complete scenario :

hive> create table default.Hello(id int,name string) clustered by
(id) into 2 buckets STORED AS ORC TBLPROPERTIES
('transactional'='true');
   
hive> insert into default.hello values(10,'abc');

Now I am trying to access Hive Orc data from Spark sql but it show only schema

>spark.sql("select * from  hello").show()  

Output: id,name

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Ajinkya
  • 81
  • 2
  • 7

3 Answers3

2

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

Gowtham SB
  • 332
  • 1
  • 3
  • 16
1

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

parisni
  • 920
  • 7
  • 20
-1

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

pbahr
  • 1,300
  • 12
  • 14