Hive partition query is scanning all partitions

Question

When I write the hive query like below

select count(*)
from order
where order_month >= '2016-11';

Hadoop job information for Stage-1: number of mappers: 5; number of reducers: 1

I am getting 5 mappers only it means reading required partitions only(2016-11 and 2016-12)

Same query I write using function

select count(*)
from order
where order_month >= concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)));

Note:

concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10))) = '2016-11'

Hadoop job information for Stage-1: number of mappers: 216; number of reducers: 1

this time it is reading all partitions {i.e. 2004-10 to 2016-12}. .

How to modify the query to read required partitions only.

Possible duplicate of [Hive partition pruning on computed column](http://stackoverflow.com/questions/35782664/hive-partition-pruning-on-computed-column) — Iulian Dogariu, Dec 10 '16 at 20:43
See also this question http://stackoverflow.com/questions/33369488/hive-query-runs-very-slowly-when-using-variables-in-where-condition/33384954 — Iulian Dogariu, Dec 10 '16 at 20:46

leftjoin · Answer 1 · 2020-02-12T07:15:47.167

0

unix_timestamp() function is non-deterministic and prevents proper optimization of queries - this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP and CURRENT_DATE.

Use current_date, also no need to calculate year and month separately:

where order_month >= substr(date_sub(current_date, 10),1,7)

edited Feb 12 '20 at 07:15

answered Feb 12 '20 at 06:57

leftjoin

36,950
8
57
116

Hive partition query is scanning all partitions

1 Answers1