5

I have a table in Hive with with 5 columns i.e. email, a_first_date, b_first_date, c_first_date, d_first_date.

a, b, c, d are 4 different actions which a user can take and 4 columns in the table above indicate the date on which the user did the first respective action. For example, the value in 'a_first_date' has the date on which the user did action a.

Output: What I want is 2 columns having email, overall_first_date i.e. on what date user did his first action?

Example Table: (Assume all values are BIGINT type other than email)

email,a_first_date,b_first_date,c_first_date,d_first_date
abc,20140707,20140702,20140801,20140907
xyz,20140107,20140822,20140201,20141007

Output:

email,overall_first_date
abc,20140702
xyz,20140107

Possible couple of solutions are to write a UDF or compare these values with each other using IF ELSE and then find the minimum but that would involve a lot of comparisons.

Alternatively I can do a:

select email, min(action) as overall_first_date from

(
select email, a_first_date as action from mytable
UNION ALL
select email, b_first_date as action from mytable
UNION ALL
select email, c_first_date as action from mytable
UNION ALL
select email, d_first_date as action from mytable
) q1

GROUP BY email 

but again this is not a good way.

Could anyone please suggest a better way to achieve this?

Amar
  • 3,825
  • 1
  • 23
  • 26

3 Answers3

8

Use the function least(). For example; Select *, least(col1,col2,col3) as minofcol from Tablename;

Yerandy
  • 81
  • 1
  • 1
6

You could use Hive's array functions:

select email, 
       sort_array(array(a_first_date, b_first_date, c_first_date, d_first_date))[0]  as overall_first_date
from table;

I'm not sure how this compares to a CASE statement performance-wise. Since you don't have a lot of columns both are equally simple.

eebbesen
  • 5,070
  • 8
  • 48
  • 70
Michael
  • 371
  • 1
  • 7
1

Why not use a case statement? This seems to have been thoroughly discussed in SQL here: https://dba.stackexchange.com/questions/21542/what-is-the-most-efficient-way-to-get-the-minimum-of-multiple-columns-on-sql-ser

Community
  • 1
  • 1
JJFord3
  • 1,976
  • 1
  • 25
  • 40