Spark SQL: Percentile Calculation using Hive Generic Functions

Asked Mar 23 '16 at 19:17

Active Jan 02 '18 at 15:40

Viewed 1,422 times

I am trying to calculate the percentile values of a column x using hive udf. However, when I execute the following using spark-submit, I get the following run time exception:

import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox    
val x = df_grouped.select(callUDF("percentile_approx",col("x"),lit(0.05))).head.getDouble(0)
            println(x)

When I execute the above code, I get the following:

org.apache.spark.sql.AnalysisException: undefined function percentile_approx;
    at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonf

If there is a better way to calculate the percentile value, that would also be appreciated.

edited Jan 02 '18 at 15:40

Will Vousden

32,488
9
84
95

asked Mar 23 '16 at 19:17

Neel

9,913
16
52
74

Can you check if your sqlcontext is an instance of hivecontext ? – eliasah Jun 07 '16 at 08:46
How do I check that? – Neel Jun 08 '16 at 13:35
Read this answer : http://stackoverflow.com/a/36172311/3415409 – eliasah Jun 08 '16 at 13:39
I am using sqlContext which is not an instance of hiveContext. – Neel Jun 08 '16 at 13:50
1

You need to use a hivecontext in order to run hive specific functions. Check my answers about hivecontext [here](http://stackoverflow.com/search?q=user%3A3415409+%5Bapache-spark%5D+hivecontext) ! – eliasah Jun 08 '16 at 13:55
So did it solve your issue ? – eliasah Jun 09 '16 at 05:14

Spark SQL: Percentile Calculation using Hive Generic Functions

0 Answers0