Handling null values on hive

Question

I have a column in hive with type double, but some of the rows are NULL when I do:

select columnA from table;

now, if I run the following, I get 0 for both queries:

select count(*) from table where columnA = "NULL";
select count(*) from table where columnA = NULL;

how can I count the rows in my table which are NULL?

score 6 · Accepted Answer · answered Aug 08 '13 at 15:39

6

The correct query is:

select count(*) from table where columnA is null;

answered Aug 08 '13 at 15:39

Olaf

6,249
1
19
37

Olaf, do you know how I could do this for every column in a table? The table has a large number of columns and I'd rather not have to run one query for every single column. – Gyan Veda Mar 07 '14 at 15:48
1

@user2932774: I have not tried that in Hive, but in a standard SQL you can use "case". See for reference: http://stackoverflow.com/questions/12789396/how-to-get-multiple-counts-with-one-sql-query – Olaf Mar 07 '14 at 16:30
but what does columnA = null stand for? It won't throw an exception, which shows "columnA = null" at least is working normally. – Jing He Aug 25 '17 at 13:06

score 5 · Answer 2 · answered Jan 20 '15 at 19:14

In Hive, count(*) counts all rows and count(columnA) will only count rows where columnA is non-NULL. If you would like to do multiple columns you could write the query as:

select count(*)-count(columnA), count(*)-count(columnB) from table;

and get the number of null values in each column. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Handling null values on hive

2 Answers2