115

I'm looking for the SQL equivalent of SET varname = value in Hive QL

I know I can do something like this:

SET CURRENT_DATE = '2012-09-16';
SELECT * FROM foo WHERE day >= @CURRENT_DATE

But then I get this error:

character '@' not supported here

Lukas Würzburger
  • 6,543
  • 7
  • 41
  • 75
user1678312
  • 1,309
  • 3
  • 10
  • 11
  • Unfortunately, there is no safe way to set a string variable because if someone runs the query without setting the variable, then the string will simply use the variable call as a string. :( – combinatorist Apr 09 '18 at 21:35

10 Answers10

224

You need to use the special hiveconf for variable substitution. e.g.

hive> set CURRENT_DATE='2012-09-16';
hive> select * from foo where day >= ${hiveconf:CURRENT_DATE}

similarly, you could pass on command line:

% hive -hiveconf CURRENT_DATE='2012-09-16' -f test.hql

Note that there are env and system variables as well, so you can reference ${env:USER} for example.

To see all the available variables, from the command line, run

% hive -e 'set;'

or from the hive prompt, run

hive> set;

Update: I've started to use hivevar variables as well, putting them into hql snippets I can include from hive CLI using the source command (or pass as -i option from command line). The benefit here is that the variable can then be used with or without the hivevar prefix, and allow something akin to global vs local use.

So, assume have some setup.hql which sets a tablename variable:

set hivevar:tablename=mytable;

then, I can bring into hive:

hive> source /path/to/setup.hql;

and use in query:

hive> select * from ${tablename}

or

hive> select * from ${hivevar:tablename}

I could also set a "local" tablename, which would affect the use of ${tablename}, but not ${hivevar:tablename}

hive> set tablename=newtable;
hive> select * from ${tablename} -- uses 'newtable'

vs

hive> select * from ${hivevar:tablename} -- still uses the original 'mytable'

Probably doesn't mean too much from the CLI, but can have hql in a file that uses source, but set some of the variables "locally" to use in the rest of the script.

Ivelin
  • 12,293
  • 5
  • 37
  • 35
libjack
  • 6,403
  • 2
  • 28
  • 36
  • 1
    This is passing a parameter from command line I'm developing queries in Karmasphere and need to set a few contants in the begging so that I don't hard code dates 10 times in my script. Is something like that possible? – user1678312 Sep 19 '12 at 22:16
  • works both ways, if you do `set CURRENT_DATE='2012-09-16';` you can refer to it later with `${hiveconf:CURRENT_DATE}` – libjack Sep 19 '12 at 22:46
  • updated my answer to clarify that, plus added additional info about other variables. see the [docs](http://hive.apache.org/docs/r0.9.0/language_manual/var_substitution.html) for more info. – libjack Sep 19 '12 at 22:54
  • 1
    How does this work if I have multiple Hive jobs running simultaneously? Will they end up picking up values from each other? In automation I am constructing an HQL file by prepending it with some SET statements. I want to make sure if I submit two jobs at the same time that use the same variable names, one job won't pick up values from the other job. The semantics here aren't clear from your answer. – MattD Apr 04 '14 at 13:56
  • 5
    this works for me on Hive server. However, I've setup some integration tests on local machine in IntelliJ. I keep getting following error when trying to use variable this way:```FAILED: ParseException line x:y cannot recognize input near '$' '{' 'hiveconf' in expression specification``` – chepukha Aug 14 '14 at 16:06
  • worth adding that hivevar variable (i.e. default namespace one) can be passed like that: `hive --hivevar var=value -f yourscript.q` Also it's --hiveconf, not -hiveconf – Kranach Jan 26 '15 at 19:26
  • Errata: Note: The variant "-hiveconf" is supported as well as "--hiveconf". -_- – Kranach Jan 26 '15 at 19:37
  • @Kranach good catch, totally missed that one; I generally use the "--hiveconf" version. – libjack Jan 26 '15 at 22:06
  • @libjack Can we write this command in hql script? "set hivevar:maxDT=(select max(col1) from tableA);" – DatabaseCoder Jul 26 '19 at 05:49
  • 1
    @DatabaseCoder To my knowledge, nothing like that will work. Whenever I need something like that, I have to do the first query and then pass in via "--hiveconf" – libjack Aug 08 '19 at 19:57
  • This doesn't work in current version of Hive and the response is misleading for many reasons: 1. `CURRENT_DATE` is a reserved and predefined constant and function in Hive 2.x 2. hiveconf: namespace is reserved for configuration values and protected in Hive 3.x so you should not be able to change it (runtime error) unless you put this in the whitelist (bad practice). Use hivevar: instead! 3. set does not evaluates the expression, so in the example '${hiveconf:CURRENT_DATE}' will be ''2012-09-16'' (duplicated quote) and this gives an error – Luis Vazquez Jul 24 '20 at 16:50
35

Most of the answers here have suggested to either use hiveconf or hivevar namespace to store the variable. And all those answers are right. However, there is one more namespace.

There are total three namespaces available for holding variables.

  1. hiveconf - hive started with this, all the hive configuration is stored as part of this conf. Initially, variable substitution was not part of hive and when it got introduced, all the user-defined variables were stored as part of this as well. Which is definitely not a good idea. So two more namespaces were created.
  2. hivevar: To store user variables
  3. system: To store system variables.

And so if you are storing a variable as part of a query (i.e. date or product_number) you should use hivevar namespace and not hiveconf namespace.

And this is how it works.

hiveconf is still the default namespace, so if you don't provide any namespace it will store your variable in hiveconf namespace.

However, when it comes to referring a variable, it's not true. By default it refers to hivevar namespace. Confusing, right? It can become clearer with the following example.

If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace.

set var="default_namespace";

So, to access this you need to specify hiveconf namespace

select ${hiveconf:var};

And if you do not provide namespace it will give you an error as mentioned below, reason being that by default if you try to access a variable it checks in hivevar namespace only. And in hivevar there is no variable named var

select ${var}; 

We have explicitly provided hivevar namespace

set hivevar:var="hivevar_namespace";

as we are providing the namespace this will work.

select ${hivevar:var}; 

And as default, workspace used during referring a variable is hivevar, the following will work too.

select ${var};
Mario Becerra
  • 514
  • 1
  • 6
  • 16
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
9

Have you tried using the dollar sign and brackets like this:

SELECT * 
FROM foo 
WHERE day >= '${CURRENT_DATE}';
Jake1164
  • 12,291
  • 6
  • 47
  • 64
YABADABADOU
  • 1,238
  • 1
  • 16
  • 38
  • This is the only working answer for me. The quotes are required in my ambari hive interface. – Laurens Koppenol Dec 07 '16 at 10:20
  • there are two things hivevar and hiveconf- both are explained in detail [here](https://stackoverflow.com/questions/37997008/what-is-the-difference-between-hivevar-and-hiveconf) – Rahul Sharma Jul 28 '17 at 22:16
4

Just in case someone needs to parameterize hive query via cli.

For eg:

hive_query.sql

SELECT * FROM foo WHERE day >= '${hivevar:CURRENT_DATE}'

Now execute above sql file from cli:

hive --hivevar CURRENT_DATE="2012-09-16" -f hive_query.sql
Abdul Mannan
  • 1,072
  • 12
  • 19
3

Two easy ways:

Using hive conf

hive> set USER_NAME='FOO';
hive> select * from foobar where NAME = '${hiveconf:USER_NAME}';

Using hive vars

On your CLI set vars and then use them in hive

set hivevar:USER_NAME='FOO';

hive> select * from foobar where NAME = '${USER_NAME}';
hive> select * from foobar where NAME = '${hivevar:USER_NAME}';

Documentation: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

Savio D'Souza
  • 201
  • 3
  • 6
2

One thing to be mindful of is setting strings then referring back to them. You have to make sure the quotes aren't colliding.

 set start_date = '2019-01-21';
 select ${hiveconf:start_date}; 

When setting dates then referring to them in code as the strings can conflict. This wouldn't work with the start_date set above.

 '${hiveconf:start_date}'

We have to be mindful of not setting twice single or double quotes for strings when referring back to them in the query.

Shaun
  • 79
  • 5
1

There are multiple options to set variables in Hive.

If you're looking to set Hive variable from inside the Hive shell, you can set it using hivevar. You can set string or integer datatypes. There are no problems with them.

SET hivevar:which_date=20200808;
select ${which_date};

If you're planning to set variables from shell script and want to pass those variables into your Hive script (HQL) file, you can use --hivevar option while calling hive or beeline command.

# shell script will invoke script like this
beeline --hivevar tablename=testtable -f select.hql
-- select.hql file
select * from <dbname>.${tablename};
Piyush Patel
  • 1,646
  • 1
  • 14
  • 26
0

Try this method:

set t=20;
select *
from myTable
where age > '${hiveconf:t}'; 

it works well on my platform.

Alien
  • 15,141
  • 6
  • 37
  • 57
typhoonbxq
  • 91
  • 1
  • 2
0

You can export the variable in shell script export CURRENT_DATE="2012-09-16"

Then in hiveql you like SELECT * FROM foo WHERE day >= '${env:CURRENT_DATE}'

Dileep Dominic
  • 499
  • 11
  • 23
-7

You can store the output of another query in a variable and latter you can use the same in your code:

set var=select count(*) from My_table;
${hiveconf:var};
Unheilig
  • 16,196
  • 193
  • 68
  • 98
Suman
  • 476
  • 5
  • 7