I am trying to create a UDF
in Hive
. This UDF
has to auto increment a hive
table column called id
.
Now the following is the Java
code to create the UDF
.
package myudf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
@UDFType(deterministic = false, stateful = true)
public class autoincrement extends UDF{
int lastValue;
public int evaluate() {
lastValue++;
return lastValue;
}
}
Now I am able create a jar file and add the jar file to hive like below:
add jar /home/cloudera/Desktop/increment.jar;
Then create a temporary function
create temporary function inc as 'myudf.autoincrement';
Create table like below.
Create table abc(id int, name string)
Insert values:
INSERT into TABLE abc SELECT inc() as id, 'Tim';
Do select statement:
select * from abc;
Output:
1 Tim
Insert values:
INSERT into TABLE abc SELECT inc() as id, 'John';
Do select statement:
select * from abc
Output:
1 Tim
1 John
But what I was expecting was when I insert values for the 2nd time.
My expected output was :
1 Tim
2 John
How to get the expected output. What should I change in the Java
code to get the desired result?
And Can I use the same function in Spark
as well
In spark when I do
sqlContext.sql("show functions")
It shows the list of all functions available in Hive
But when I do
sqlContext.sql("INSERT into TABLE abc SELECT inc() as id, 'Jim'")
I got the below error
pyspark.sql.utils.AnalysisException: u'undefined function inc; line 1 pos 29'
How to create the same UDF
in pyspark
and get the desired output
What happens when the insert statements are executed at the same time?