1

I need to create a configurable view which will be able to produce a result according to the configuration. My first approach was to use hive variables and put a variable in a view but this doesn't work because when the view in creating, it takes an actual value of a variable (the view is static and can't be configurable). The second approach was to call UDF and access a variable from it. I think that this approach will work but I don't have any idea how to write it properly. Can you please share your ideas and maybe experience how you would solve this problem.

sujit
  • 2,258
  • 1
  • 15
  • 24
Aleksejs R
  • 487
  • 2
  • 5
  • 15

2 Answers2

1

UPDATE

Configurability of views seems to be possible through UDF's. I am striking out earlier answer in full.

For showing above, I created a simple UDF that outputs a random double.

package com.example.hive.udf;
public final class MyRandom extends UDF {
    public double evaluate(final double d) {
    Random r = new Random(System.currentTimeMillis());
    return r.nextDouble();
    }
}

And, registered the jar with hive to create my udf:

hive> add jar <my-local-path>/myudf.jar;
hive> create temporary function myrand as 'com.example.hive.udf.MyRandom';

Assuming I already have a simple table sample, when I create a view as follows:

CREATE OR REPLACE VIEW view as 
select *, myrand(1) from sample;

And, then on successive select * on the view, I get different results. This means that on every statement involving the view, the UDF call happens.

Keep in mind that the UDF can't be passed parameters using ${hiveconf:XXX}, since this will be evaluated and baked into the view definition.

So, the way to achieve this configurability will be get the configurations within the UDF class code. This answer mentions the way how JobConfig can be accessed within hive UDFs.

I have raised a related Question on whether it's possible to access hive variables within an UDF. Consider helping if you have the answer (if that's still unanswered).

Alex, I think this is not possible.

The reason is same as why the first approach doesn't work.

The select query that forms the view will always be evaluated at the time of view creation. So, even if you had passed some variable to a UDF that was used in the select query to form the view, the UDF will be evaluated at that point itself to materialise the view and view contents are fixed at that time.

The next time you access the view (without create or replace), the UDF won't be re-invoked.

So, unless you are open to calling your view related DML queries along with a create or replace view run before always, there is no way to achieve a configurable view.

See this answer to observe how the view is always materialised irrespective of variables used. Same will be the case with UDF's.

sujit
  • 2,258
  • 1
  • 15
  • 24
1

@Alex - on your second approach, below is the way to access a hive conf value inside a GenericUDF evaluate() method. You may store the variable to a instant variable and use later.

@Override
public ObjectInspector evaluate(ObjectInspector[] args) throws UDFArgumentException {
    String myconf;
    SessionState ss = SessionState.get();
    if (ss != null) {
        HiveConf conf = ss.getConf();
        myconf= conf.get("my.hive.conf");
        System.out.println("sysout.myconf:"+ myconf);
    }
}

The code is tested on hive 1.2

To test the code:

  1. Build UDF Jar
  2. On hive CLI, execute the below commands:

    SET hive.root.logger=INFO,console;
    SET my.hive.conf=test;
    ADD JAR /path/to/the/udf/jar;
    CREATE TEMPORARY FUNCTION test_udf AS com.example.my.udf.class.qualified.classname';
    
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53