0

I have a table which has 9 columns. Below is the structure of it

Raw Table Structure

I need the min and max of these columns for a row excluding zeros. Below is the required table structure

Output Required

If you see the columns min and max, min is minimum of 7 cols (col1 to col7) in a particular row excluding zero and max is the maximum of the 7 cols (col1 to col7) for that row.

Please help me to accomplish this in hive (hue).

KVHP
  • 33
  • 1
  • 7

2 Answers2

1

You can use least and greatest to get the min and max, and use when to remove 0.

select *,
    least(
        case when col1 != 0 then col1 else 99999999 end,
        case when col2 != 0 then col2 else 99999999 end,
        case when col3 != 0 then col3 else 99999999 end,
        case when col4 != 0 then col4 else 99999999 end,
        case when col5 != 0 then col5 else 99999999 end,
        case when col6 != 0 then col6 else 99999999 end,
        case when col7 != 0 then col7 else 99999999 end,
    ) as `Min`
    greatest(
        case when col1 != 0 then col1 else -99999999 end,
        case when col2 != 0 then col2 else -99999999 end,
        case when col3 != 0 then col3 else -99999999 end,
        case when col4 != 0 then col4 else -99999999 end,
        case when col5 != 0 then col5 else -99999999 end,
        case when col6 != 0 then col6 else -99999999 end,
        case when col7 != 0 then col7 else -99999999 end
    ) as `Max`
from mytable
mck
  • 40,932
  • 13
  • 35
  • 50
  • 1
    This will not work because least function is Fixed to return NULL when one or more arguments are NULL. For example, if a column has zero as value then case statement will give NULL as output which makes the least or greatest function result as NULL. But NULL is not the desired output. – KVHP Apr 03 '21 at 10:33
  • Oh, I didn't realise that Hive has this behaviour. How about adding `else 99999999` for `least` and `else -99999999` for `greatest`? – mck Apr 03 '21 at 10:55
  • See https://stackoverflow.com/questions/37530880/avoiding-null-return-value-with-greatest-function for more examples – mck Apr 03 '21 at 10:56
  • There is no point in giving 99999999 because if the sample values from col1 to col7 are 0,25,2,587,0,1,45 we need the result to be 1 not 99999999. – KVHP Apr 04 '21 at 10:55
  • @Pushpakkvh not sure if you understood my comment, but could you have a look at the edited answer please? – mck Apr 04 '21 at 11:45
  • I got it but what will happen in the extreme case (if there are numbers greater than 99999999)? – KVHP Apr 05 '21 at 06:29
  • @KVHP then you can use an even larger number for this purpose. For an integer type column, the largest possible number is 2147483647. – mck Apr 05 '21 at 08:23
  • Do we have any other way, apart from assigning a bigger number manually and doing it. Another thing is, do we have any other solution that will improve the efficiency? – KVHP Apr 05 '21 at 08:30
  • That's the only way that I can think of... Using the biggest possible number allowed by the datatype of that column should be a reliable and efficient way, as suggested in https://stackoverflow.com/questions/37530880/avoiding-null-return-value-with-greatest-function – mck Apr 05 '21 at 08:31
0

You can use an explicit case expression:

select (case when col1 <> 0 and col1 >= col2 and col1 >= col3 and col1 > =col4 and col1 >= col5 and col1 >= col6 and col1 >= col7
             then col1
             . . .
        end)

This is a lot of typing. It probably suggests that your data is stored incorrectly -- you should have separate rows for each column. You can mimic this by expanding the data and reaggrating:

select name, state, col1, col2, col3, col4, col5, col6, col7,
       min(case when col1 <> 0 then col1 end) as min_value,
       max(case when col1 <> 0 then col1 end) as max_value,
from ((select t.*, col1 as col from t) union all
      (select t.*, col2 as col from t) union all
      (select t.*, col3 as col from t) union all
      (select t.*, col4 as col from t) union all
      (select t.*, col5 as col from t) union all
      (select t.*, col6 as col from t) union all
      (select t.*, col7 as col from t)
     ) x
group by name, state, col1, col2, col3, col4, col5, col6, col7;
      
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This might work but this will increase the process time very much as we are considering the same columns and doing union. – KVHP Apr 04 '21 at 10:58