1

I am trying to calculate the percentile values of a column x using hive udf. However, when I execute the following using spark-submit, I get the following run time exception:

import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox    
val x = df_grouped.select(callUDF("percentile_approx",col("x"),lit(0.05))).head.getDouble(0)
            println(x)

When I execute the above code, I get the following:

org.apache.spark.sql.AnalysisException: undefined function percentile_approx;
    at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonf

If there is a better way to calculate the percentile value, that would also be appreciated.

Will Vousden
  • 32,488
  • 9
  • 84
  • 95
Neel
  • 9,913
  • 16
  • 52
  • 74

0 Answers0