0

I am not able to understand output of query run on hive

select count(*),
count(col1) as count, 
count( distinct col1) as distinct,
SUM (case when (cast(col1 as BIGINT) is null or cast(col1 as BIGINT) is not null )then 1 else 0 end) as total_count,
SUM (case when cast(col1 as BIGINT) is null then 1 else 0 end) as non_int_count,
SUM (case when cast(col1 as BIGINT) is not null then 1 else 0 end) as int_count,
SUM (case when cast(col1 as BIGINT) is null then 0 else 1 end) as int_count2,
FROM TABLE
where conditions

Result for this query in Hive is

count(*)   count   distinct total_count  non_int_count  int_count     int_count2
23030525  23030525 1631400  23030525        2            258898        258898

Shouldn't total_count=count(*)=(non_int_count+ int_count). ?

  • Wild guess: can it be the same issue with statistics: https://stackoverflow.com/a/39914232/2700344 – leftjoin Jun 30 '20 at 18:49
  • Another possible reason is `select cast("Null" as bigint)` will also yield NULL value, just like `select cast("" as bigint)` or `select cast("abc" as bigint)`. That could be something to check. – Sayon M Jul 09 '20 at 21:32

0 Answers0