I configured Hive parallelism with below hive-site.xml properties and restarted the cluster
Property 1
Name: hive.exec.parallel
Value: true
Description: Run hive jobs in parallel
Property 2
Name: hive.exec.parallel.thread.number
Value: 8 (default)
Description: Maximum number of hive jobs to run in parallel
To test parallelism, I created below 2 conditions:
1. Single Query in file.hql and Run it as hive -f file.hql
SELECT COL1, COL2 FROM TABLE1
UNION ALL
SELECT COL3, COL4 FROM TABLE2
Result:
When hive.exec.parallel = true, Time taken: 28.015sec, Total MapReduce CPU Time Spent: 3seconds 10msec
When hive.exec.parallel = false, Time taken: 24.778 seconds, Total MapReduce CPU Time Spent: 3 seconds 90 msec.
2. Independent queries in 2 different files as below and run it as nohup hive -f file1.hql & nohup hive -f file2.hql
select count(1) from t1 -> file1.sql
select count(1) from t2 -> file2.sql
Result:
When hive.exec.parallel = false, Time taken: 29.391 seconds, Total MapReduce CPU Time Spent: 1 seconds 890 msec
Question:
How do I check above 2 conditions are indeed running in parallel? In console, I see the result as if queries were running sequentially.
Why the Time taken is more when hive.exec.parallel = true ? How can I see that hive multiple stages are utilized?
Thank you,