12

I'm trying to set up some Views in Hive that will take a date as a dynamic parameter. In my working below, I've swapped to using the hiveconf variable in the Select clause, so we can see what's going on, but the principle remains the same

According to this and this , I should be able to include a statement in " ${hiveconf:dateRangeFrom}" in my Create View statement, supply the hiveconf:dateRangeFrom variable at run time for maximum happiness, but this is just not happening - Hive appears to be using whatever value is assigned to the variable when the View is created and hard-coding it into the View definition, not substituting it at runtime as you might expect.

I've got a workaround, whereby I supply a parameter to a sql file that then creates all the views, substituting the desired value in, but this isn't sustainable

All the working is below, so you can see how I came to this conclusion. Any ideas?


1) Supply a hiveconf value to a simple query

(needs to be a date for the ultimate query)

hive -e "Select  ${hiveconf:dateRangeFrom} , unix_timestamp(${hiveconf:dateRangeFrom} , 'yyyy-MM-dd');"  --hiveconf dateRangeFrom='2014-01-01'

The date will be returned as suppled, and converted to a unix timestamp (eg "2014-01-01"=1388534400, "2014-09-12"=41047640). The script can be be run repeatedly with results changing accordingly with the parameter.

2) Create a view that returns this data

CREATE VIEW get_date AS 
SELECT  ${hiveconf:dateRangeFrom}, unix_timestamp(${hiveconf:dateRangeFrom} , 'yyyy-MM-dd');

This returns the error:

FAILED: ParseException line 2:8 cannot recognize input near '$' '{' 'hivevar' in  select clause

Presumably because it is trying to do a replacement, but the ${hivevar:dateRangeFrom} variable has not been initialized at this point

According to: Creating Views in Hive with parameter and http://mail-archives.apache.org/mod_mbox/hive-user/201205.mbox/%3CBAY151-W9BC976D584FD172E7D70BC0160@phx.gbl%3E

Then variables can be used in Hive views, long as quotes are used around them:

CREATE VIEW get_date AS 
SELECT  "${hiveconf:dateRangeFrom}", unix_timestamp("${hiveconf:dateRangeFrom}" , 'yyyy-MM-dd');

This allows the view to be created, so trying to call the view using a parameter:

hive -e "Select  * from get_date"  --hiveconf dateRangeFrom='2014-01-01'

just returns the variable name:

${hiveconf:dateRangeFrom}       NULL
    Time taken: 20.614 seconds, Fetched: 1 row(s)

Using single quotes instead:

DROP VIEW get_date;
CREATE VIEW get_date AS 
SELECT  '${hiveconf:dateRangeFrom}', unix_timestamp('${hiveconf:dateRangeFrom} ', 'yyyy-MM-dd');

Gives the same result, just the variable name.

3) Create a view in an interactive session with the variable already set

SET hiveconf:dateRangeFrom="2014-02-01";

Rebuild the original view, with the variables without quotes

DROP VIEW get_date;
CREATE VIEW get_date AS 
SELECT  ${hiveconf:dateRangeFrom}, unix_timestamp(${hiveconf:dateRangeFrom} , 'yyyy-MM-dd');

Then calling "select * from get_date;" from within the session gives the expected result.

As does calling from the command line, with the same parameter value:

hive -e "Select  * from get_date;"  --hiveconf dateRangeFrom='2014-02-01'

However, if we call the view with a different parameter, then we still get the original answer:

hive -e "Select  * from get_date;"  --hiveconf dateRangeFrom='2014-09-12'

    2014-02-01      1391212800
    Time taken: 24.773 seconds, Fetched: 1 row(s)

If we set the variable inside a new session:

SET hiveconf:dateRangeFrom="2014-06-01";

or even not set it all, we still get the same result

Looking at the extended view definition, the reason is obvious:

hive> describe extended get_date;
OK
_c0                     string
_c1                     bigint

Detailed Table Information      Table(tableName:get_date, dbName:default, owner:
36015to, createTime:1410523149, lastAccessTime:0, retention:0, sd:StorageDescrip
tor(cols:[FieldSchema(name:_c0, type:string, comment:null), FieldSchema(name:_c1
, type:bigint, comment:null)], location:null, inputFormat:org.apache.hadoop.mapr
ed.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequen
ceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:nu
ll, serializationLib:null, parameters:{}), bucketCols:[], sortCols:[], parameter
s:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValu
eLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{
transient_lastDdlTime=1410523149}, ***viewOriginalText:SELECT  "2014-02-01", unix_t
imestamp("2014-02-01" , 'yyyy-MM-dd'), viewExpandedText:SELECT  "2014-02-01", un
ix_timestamp("2014-02-01" , 'yyyy-MM-dd')***, tableType:VIRTUAL_VIEW)
Time taken: 0.123 seconds, Fetched: 4 row(s)

The variable substitution took place when the view was created, and hard-coded that date into the definition:

viewOriginalText:SELECT "2014-02-01", unix_t imestamp("2014-02-01" , 'yyyy-MM-dd'), viewExpandedText:SELECT "2014-02-01", un ix_timestamp("2014-02-01" , 'yyyy-MM-dd')

4) Switch off variable subsitution

Hive is clearly putting in the current value of the variable at run-time, so I tried switching it off and recreating the query:

hive> set hive.variable.substitute;
hive.variable.substitute=true
hive> set hive.variable.substitute = false;
hive> set hive.variable.substitute;
hive.variable.substitute=false

The Create View statement still fails with the same error:

FAILED: ParseException line 2:8 cannot recognize input near '$' '{' 'hiveconf' in select clause

5) Workaround

If we create a sql file that creates the views, testParam.sql, we can sort of get around the problem:

DROP VIEW get_date;
CREATE VIEW get_date AS
SELECT  ${hivevar:dateRangeFrom}, unix_timestamp(${hivevar:dateRangeFrom} , 'yyyy-MM-dd');
SELECT * FROM get_date;

Calling that from the command line gives the expected results:

hive -f testParam.sql --hiveconf dateRangeFrom='2014-08-01'
2014-08-01      1406847600
Time taken: 20.763 seconds, Fetched: 1 row(s)


hive -f testParam.sql --hiveconf dateRangeFrom='2014-09-12'
2014-09-12      1410476400
Time taken: 19.74 seconds, Fetched: 1 row(s)    

This does work, and will be fine for now, but is hardly ideal for a distributed, multi-user environment. Looking at the view meta-data, we can see that the view is always destroyed and rebuilt with the latest parameters:

transient_lastDdlTime=1410525287}, viewOriginalText:SELECT  '2014-09-12', unix_timestamp('2014-09-12' , 'yyyy-MM-dd'), viewExpandedText:SELECT  '2014-09-12', unix_timestamp('2014-09-12' , 'yyyy-MM-dd'), tableType:VIRTUAL_VIEW)

So, how to create a view that can be supplied with dynamic parameters at runtime without constantly rebuilding it

Community
  • 1
  • 1
TobyEvans
  • 1,431
  • 2
  • 21
  • 27
  • In the try number 1) you can pass variables directly from the shell: dateRangeFrom=2014-01-01; hive -e "Select '$dateRangeFrom', unix_timestamp('$dateRangeFrom' , 'yyyy-MM-dd');" – leftjoin Sep 29 '16 at 14:24
  • I must say the two references that you give are of somewhat dubious quality. It may very well not really be possible. – Dennis Jaheruddin Aug 08 '17 at 13:57
  • probably - it was a while ago now, but I seem to remember I went for the unsustainable version, supplying a variable to a sql file, which then embedded the supplied value into the view ... – TobyEvans Aug 23 '17 at 15:33
  • Is anybody aware on whether this is present in newer hive versions? - we seem to have no luck... – Irene May 17 '19 at 10:42
  • Irene - you probably want to be using Spark/Presto now ...\ – TobyEvans Jul 16 '20 at 15:02

2 Answers2

0

How are you defining daterangeFrom? I think daterange from can be dynamically generated from current_date function by adding and subtracting the days based on your requirement. You can simply use hive functions for that.

Manu Gupta
  • 820
  • 6
  • 20
0

I dont if this is what you're looking for! if you are passing values from a bash script, this should do the job:

dateRangeFrom=$(date +"%Y-%m-%d")
hive -e "Select  '${dateRangeFrom}' , unix_timestamp('${dateRangeFrom}' , 'yyyy-MM-dd');"

If you want to set the value in hive script itself you can do something like this

hive -e "SET hivevar:dateRangeFrom=2017-11-21;USE mydb; Select  '${dateRangeFrom}' , unix_timestamp('${dateRangeFrom}' , 'yyyy-MM-dd');"

If you want to store the same hive query in a HQL file and run it externally, then you need to pass it like this

hive -f /abc/user/script.hql --hivevar dateRangeFrom=2017-11-21
Rohit Nimmala
  • 1,459
  • 10
  • 28