Hive version: 3.1.0.3.1.4.0-315 spark version: 2.3.2.3.1.4.0-315
Basically, i am trying to read transactional table data from spark. As per this page [https://stackoverflow.com/questions/50254590/how-to-read-orc-transaction-hive-table-in-spark][1], found that transactional table has to be compacted. Hence, i want to try this approach.
I am new to this and was trying compaction on delta files but it always shows "initiated" and never complete. This is happening for both Major and Minor compaction. Any help will be highly appreciated.
- I want to know whether is this good approach.
- Also, how to monitor the compaction job process other than show compactions? i can only see the line "Compaction enqueued with id 1" from the hiveserver_stdout.log.
- Generally, how long does this compaction takes to complete?
- is there any way to stop the compactions?
TIA.
[Edited]
SHOW COMPACTIONS;
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| compactionid | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
| CompactionId | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId |
| 1 | tmp | shop_na2 | dt=2014-00-00 | MAJOR | initiated | --- | --- | --- | --- |
| 2 | tmp | na2_check | dt=2014-00-00 | MINOR | initiated | --- | --- | --- | --- |
+---------------+-----------+----------------+----------------+--------+------------+-----------+-------------+---------------+--------------+
3 rows selected (0.408 seconds)
The same compactions result has been showing for past 36 hours, though retention period has been set as 86400 sec.