Questions tagged [hive-udf]

Please use this tag for user defined functions (UDF) for apache hive.

Apache Hive is a database built on top of Hadoop that provides the following:

  • Tools to enable easy data summarization (ETL)
  • Ad-hoc querying and analysis of large datasets data stored in Hadoop file system (HDFS)
  • A mechanism to put structure on this data
  • An advanced query language called Hive Query Language which is based on SQL and some additional features such as DISTRIBUTE BY, TRANSFORM, and which enables users familiar with SQL to query this data.

How to write good Hive question:

  1. Add clear textual problem description.
  2. Provide query and/or table DDL if applicable
  3. Provide exception message
  4. Provide input and desired output data example
  5. Questions about query performance should include EXPLAIN query output.
  6. Do not use pictures for SQL, DDL, DML, data examples, EXPLAIN output and exception messages.
  7. Use proper code and text formatting

Official Website:

Useful Links:

64 questions
10
votes
4 answers

how can we test HIVE functions without referencing a table

I wanted to understand the UDF WeekOfYear and how it starts the first week. I had to artifically hit a table and run the query . I wanted to not hit the table and compute the values. Secondly can I look at the UDF source code? SELECT…
vkaul11
  • 4,098
  • 12
  • 47
  • 79
9
votes
2 answers

How to convert a Date String from UTC to Specific TimeZone in HIVE?

My Hive table has a date column with UTC date strings. I want to get all rows for a specific EST date. I am trying to do something like the below: Select * from TableName T where TO_DATE(ConvertToESTTimeZone(T.date)) = "2014-01-12" I want to…
Gadam
  • 2,674
  • 8
  • 37
  • 56
8
votes
3 answers

Select all columns of a Hive Struct

I have a requirement to select * from all columns from a hive struct. Hive create table script is here below Create Table script Select * from the table displays each struct as a column select * from table The requirement i have is to display all…
Abhijit Nayak
  • 101
  • 1
  • 1
  • 3
8
votes
2 answers

Hive collect_list() does not collect NULL values

I am trying to collect a column with NULLs along with some values in that column...But collect_list ignores the NULLs and collects only the ones with values in it. Is there a way to retrieve the NULLs along with other values ? SELECT col1, col2,…
lalith kkvn
  • 310
  • 1
  • 3
  • 11
6
votes
1 answer

Find median in spark SQL for multiple double datatype columns

I have a requirement to find median for multiple double datatype columns.Request suggestion to find the correct approach. Below is my sample dataset with one column. I am expecting the median value to be returned as 1 for my sample. scala>…
Prabu Soundar Rajan
  • 799
  • 1
  • 8
  • 14
5
votes
2 answers

Hive Aggregate function for merging arrays

I need to merge arrays in a GROUP BY in HiveSQL. The table schema is something like this: key int, value ARRAY Now here is the SQL I would like to run: SELECT key, array_merge(value) FROM table_above GROUP BY key If this array_merge function…
kee
  • 10,969
  • 24
  • 107
  • 168
3
votes
1 answer

How to add JAR for Hive custom UDF so it is available permanently on the HDInsight cluster?

I have created a custom UDF in Hive, it's tested in Hive command line and works fine. So now I have the jar file for the UDF, what I need to do so that users will be able to create temporary function pointing to it? Ideally from command prompt of…
Dhiraj
  • 3,396
  • 4
  • 41
  • 80
2
votes
1 answer

Hive : Merge two maps into one column

I have a hive table as create table mySource( col_1 map, col_2 map ) here is how a record might look like col_1 col_2 {"a":1, "b":"2"} {"c":3, "d":"4"} my target table looks like…
AbtPst
  • 7,778
  • 17
  • 91
  • 172
2
votes
1 answer

Hive UDF - Generic UDF for all Primitive Type

I am trying to implement the Hive UDF with Parameter and so I am extending GenericUDF class. The problem is my UDF works find on String Datatype however it throws error if I run on other data types. I want UDF to run regardless of data type. Would…
Gaurang Shah
  • 11,764
  • 9
  • 74
  • 137
2
votes
0 answers

Hive UDF class getting Instantiated for each call of function

I have created One Hive UDF class and register its function in spark. In hive query inside spark session object i call this function. Now when i run my code i observe on each time when function called it create new instance of UDF class. Is it…
Hardik
  • 21
  • 1
2
votes
2 answers

How to split delimited String to multiple rows in Hive using lateral view explode

I have a table in Hive as below - create table somedf (sellers string , orders int ) insert into somedf values ('1--**--2--**--3',50), ('1--**--2', 10) The table has a column called sellers and it is delimited by the characters described in…
Regressor
  • 1,843
  • 4
  • 27
  • 67
2
votes
1 answer

Update the jar of a hive UDF

TL;DR: how can I update the jar of a custom UDF in hive? I wrote my own (generic) udf, working very well. I can define a new function and use it with the command: Now I want to update my udf, I thus want to put an updated version of the jar, with…
Guillaume
  • 2,325
  • 2
  • 22
  • 40
2
votes
2 answers

Hive: How to check if values of one array are present in another?

I have two arrays like this , which are being returned from a UDF I created: array A - [P908,S57,A65] array B - [P908,S57] I need to check if elements of array A are present in array B, or elements of array B are present in array A using hive…
Kuwali
  • 233
  • 3
  • 13
2
votes
0 answers

solution for "select transform" for python udf in hive

Is there a way to not to include all the columns in select transform () yet to get all the columns in output? for example: I have columns in hive table like: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10 and I am performing transform on columns c8, c9,…
2
votes
2 answers

How are hive udf, udaf, udtfs written in java debugged in an ide like eclipse?

For e.g for debugging pig udfs this works : http://ben-tech.blogspot.ie/2011/08/how-to-debug-pig-udfs-in-eclipse.html I have a hive script in which I use my udaf which is failing so I would like to step through the udf code.
shrewquest
  • 541
  • 1
  • 7
  • 22
1
2 3 4 5