0

I have a hive table which I am querying thru Impala and I have a very basic issue. The table has time duration as a string and sometimes due to error the value is not populated and it's empty. When I use a query like the select cast(duration as decimal) from my table then if it encounters null in the sample, it fails with an error like UDF ERROR. STRING TO DECIMAL PARSE FAILED.

So my question is how to handle null here. I want Impala to ignore those rows which has null or simply treat as 0 rather than coming to a grinding halt.

I tried running with the -c option at impala-shell to ignore the error but it did not help. When I try to use coalesce to return 0 instead of null it gives different error. when I try ZEROIFNULL it gives another error. Nothing seems to be working. This seems a very basic issue so I assume there should be some good way to handle nulls. Unfortunately can not find anything even if I google for hours. so please help. :)

npr73
  • 1
  • 1
  • 3

3 Answers3

0

This should work:

select 
    case col
        when null then 0
        when '' then 0
        else cast(col as decimal(15, 2))
    end as col
from table;
vc 74
  • 37,131
  • 7
  • 73
  • 89
  • Thanks.. for the suggestion.. I played around with the case statement as you suggested and then figured out what I was expecting as null is not exactly null but just empty string.. so even "when null " not able to get it.. need to figure out if its not null. is it just an empty string and how to handle it. – npr73 Jan 22 '22 at 16:59
  • @npr73 Answer updated after your comment – vc 74 Jan 23 '22 at 09:54
0

After lots of research I realized all the standard way to handle null was not working as the data which I was "thinking" to be null was not actually null but empty string. Seems like Impala treats null and empty string differently. Found another question on this board another post and that helped to handle empty string like below: trim(duration) not like '' using above I was able to exclude those rows which has empty string. thanks for reading.

npr73
  • 1
  • 1
  • 3
0

I prefer to use regex and nullif for these situations. Regex to pull out the decimal value and nullif to account for the empty string situation.

select cast(nullif(regexp_extract(null, '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;

select cast(nullif(regexp_extract('', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;

select cast(nullif(regexp_extract('dirtydata', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbenull;

select cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal) as shouldroundto1;

select cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1)) as shouldbe1_4;

If you need your final answer to be zero, then just wrap it in a zeroifnull function.

select zeroifnull(cast(nullif(regexp_extract('', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1))) as shouldbezero;

select zeroifnull(cast(nullif(regexp_extract('1.4', '[\\+-]?\\d+\\.?\\d*', 0), '') as decimal(2,1))) as shouldbe1_4;
Keith Sirmons
  • 8,271
  • 15
  • 52
  • 75