1

Trying to improve the hive query speed based on the techniques. Below config changes increases speed and want to use these settings for all the queries i execute. But i wanted some input on if these settings will impact inversely if used across all queries.

set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;

Vectorized query execution improves performance of operations like scans, aggregations, filters and joins, by performing them in batches of 1024 rows at once instead of single row each time. Introduced in Hive 0.13, this feature significantly improves query execution time.

set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
analyze table tweets compute statistics for columns;

Enable cost based optimization(cbo)

set hive.execution.engine=tez;

use tez engine

sjd
  • 1,329
  • 4
  • 28
  • 48
  • 2
    Column statistics can be expensive to fetch. And you do not need it for all tables. Everything else is good to keep for all queries – leftjoin Nov 21 '18 at 12:09
  • @leftjoin Thanks !!. yes analyze table was taking some time to complete. – sjd Nov 22 '18 at 07:03
  • Please have a look at this answer also: https://stackoverflow.com/a/40783621/2700344 – leftjoin Nov 22 '18 at 07:16

0 Answers0