0

I am trying to compute the avg/min/max of the record count for a view in hive

select avg(rec_count), max(rec_count), min(rec_count)

And trying to implement something like:

select avg(rec_count), max(rec_count), min(rec_count)
where rec_count in 
(select count(record_number) as rec_count from archives
group by record_number);

But I'm getting an error message:

FAILED: SemanticException Line 0:-1 Invalid column reference 'rec_count' in definition of SubQuery sq_1 [ rec_count in (select count(record_number) as rec_count from archives group by record_number) ] used as sq_1 at Line 2:18

I would like to know if I can do this without having to create another table/view that contains the grouped/aggregated values of the original table.

Sample data:

Blockquote

Record number 3031 has 4 records, 4050 has 6 records so the expected results that I'm hoping to get from the query would be:

avg = 5 min = 4 max = 6

Ben C Wang
  • 617
  • 10
  • 19

1 Answers1

1

I think you want something like:

select avg(rec_count), max(rec_count), min(rec_count)
from (select record_number, count(*) as rec_count
      from archives
      group by record_number
     ) a
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I thought the a was a typo, but got an error without it. What is the 'a' doing at the end? Edit: Ah nevermind, I found the answer. Thank you so much for the help! https://stackoverflow.com/questions/46811384/significance-of-random-letter-at-the-end-of-sub-query-in-from-clause-sql – Ben C Wang Jan 27 '19 at 22:44