1

I have code that looks like this:

set hivevar:foo=rand();
select ${hivevar:foo} from my_database.my_table;

I think, even with the variable substitution, rand() is still being called once per row. How do I just set foo to a random number once per query?

I can think of other solutions, but I'd like to keep everything contained to one hive query instead of patching together something uglier.

matthiasdenu
  • 323
  • 4
  • 18

2 Answers2

2

This code calls rand() for every row:

INSERT INTO dest_table
SELECT st.col1, st.col2, RAND() AS col3
FROM source_table st;

This code calls rand() only once:

WITH rand AS (
  SELECT RAND() as col3
)
INSERT INTO dest_table
SELECT st.col1, st.col2, rand.col3
FROM rand, source_table st;
VMykyt
  • 1,589
  • 12
  • 17
1

Call it in a one-row subquery and cross join with your query. The subquery should be executed once

select s.rnd, a.col, etc
  from my_database.my_table a
      cross join
  (select rand() as rnd) s
;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Awesome. This helped me understand what was going on and what I actually wanted to do: Take the cartesian product between a scalar and a vector, like what cross join does. Thanks! However, I think I'm going to use something similar to what @VMykyt has, because it seems like cross join takes a little longer to run. – matthiasdenu Sep 27 '18 at 16:41
  • 1
    @matthiasdenu It is the same cross join because `FROM rand, source_table` without `ON` condition is a cross join. You better to use ANSI join syntax – leftjoin Sep 27 '18 at 16:45
  • @matthiasdenu See also this for better understanding: https://stackoverflow.com/a/46843832/2700344 – leftjoin Sep 27 '18 at 16:46
  • @matthiasdenu Normally it should run map-join, if not, check this:https://stackoverflow.com/a/49154414/2700344 – leftjoin Sep 27 '18 at 16:51