3

I have a column in hive with type double, but some of the rows are NULL when I do:

select columnA from table;

now, if I run the following, I get 0 for both queries:

select count(*) from table where columnA = "NULL";
select count(*) from table where columnA = NULL;

how can I count the rows in my table which are NULL?

user1745713
  • 781
  • 4
  • 10
  • 16

2 Answers2

6

The correct query is:

select count(*) from table where columnA is null;
Olaf
  • 6,249
  • 1
  • 19
  • 37
  • Olaf, do you know how I could do this for every column in a table? The table has a large number of columns and I'd rather not have to run one query for every single column. – Gyan Veda Mar 07 '14 at 15:48
  • 1
    @user2932774: I have not tried that in Hive, but in a standard SQL you can use "case". See for reference: http://stackoverflow.com/questions/12789396/how-to-get-multiple-counts-with-one-sql-query – Olaf Mar 07 '14 at 16:30
  • but what does columnA = null stand for? It won't throw an exception, which shows "columnA = null" at least is working normally. – Jing He Aug 25 '17 at 13:06
5

In Hive, count(*) counts all rows and count(columnA) will only count rows where columnA is non-NULL. If you would like to do multiple columns you could write the query as:

select count(*)-count(columnA), count(*)-count(columnB) from table;

and get the number of null values in each column. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Aaron
  • 151
  • 1
  • 4