4

In pig, you can pass a configuration from your pig script to pig UDF via UDFContext. For example,

// in pig script
SET my.conf dummy-conf

// in UDF java code
Configuration conf = UDFContext.getUDFContext().getJobConf();
String myConf = conf.get("my.conf");

So, is there a similar way to pass configuration from a hive script to a hive UDF? For example, if I have set MY_CONF='foobar' in a hive script, how can I retrieve that in a java UDF, which needs to consume the value of MY_CONF?

JBT
  • 8,498
  • 18
  • 65
  • 104

4 Answers4

2

Instead of extending UDF class, you can try subclassing GenericUDF. This class has the following method you can override:

/**
 * Additionally setup GenericUDF with MapredContext before initializing.
 * This is only called in runtime of MapRedTask.
 *
 * @param context context
 */
public void configure(MapredContext context) {
}

MapredContext has a method just like UDFContext from Pig to retrieve the Job configuration. So you could just do the following:

@Override
public void configure(MapredContext context) {
    Configuration conf = context.getJobConf();  
}
Balduz
  • 3,560
  • 19
  • 35
1

As of hive 1.2 there are 2 approaches.

1. Overriding configure method from GenericUDF

  @Override
   public void configure(MapredContext context) {
       super.configure(context);
       someProp = context.getJobConf().get(HIVE_PROPERTY_NAME);
   }

Above(1) won't work in all the cases. Works only in MapredContext. Every query has to be force map/reduce jobs, to do that set

set hive.fetch.task.conversion=minimal/none;
set hive.optimize.constant.propagation=false;

. with above properties set, you will hit major performance problems, especially for smaller queries.

2. Using SessionState

 SessionState ss = SessionState.get();
     if (ss != null) {
          this.hiveConf = ss.getConf();
          someProp = this.hiveConf.get(HIVE_PROPERTY_NAME);
          LOG.info("Got someProp: " + someProp);
      }
piet.t
  • 11,718
  • 21
  • 43
  • 52
rbyndoor
  • 699
  • 4
  • 14
0

Go to hive command line

hive> set MY_CONF='foobar';

your variable should be listed when hitting the command

hive> set;

Now, consider you have below
Jar: MyUDF.jar
UDF calss: MySampleUDF.java which accepts a String value.
Table: employee

hive> ADD JAR /MyUDF.jar
hive> CREATE TEMPORARY FUNCTION testUDF AS 'youpackage.MySampleUDF';
hive> SELECT testUDF(${MY_CONF}) from employee;
Ranjith Sekar
  • 1,892
  • 2
  • 14
  • 18
-3

there are lots of example, shared , so you can find all required details over google :).

A Small Example which was describe in shared link:

hive> ADD JAR assembled.jar;
hive> create temporary function hello as 'com.test.example.UDFExample';
hive> select hello(firstname) from people limit 10;

Please check link for reference which I normally Used to: Link1 Link2

Maytham Fahmi
  • 31,138
  • 14
  • 118
  • 137
Deb
  • 473
  • 3
  • 13
  • I am asking about passing configurations from hive script, where UDF is called, to the UDF. Your links are about how to write hive UDFs, which are not so relevant in this case. – JBT Aug 24 '15 at 03:41
  • My Bad, though have you tried with Hive Resources [link](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli) its having feature of Adding/deleting/archiving one or more files, jars, or archives to the list of resources in the distributed cache. – Deb Aug 24 '15 at 04:57