I am trying to calculate the percentile values of a column x
using hive udf. However, when I execute the following using spark-submit, I get the following run time exception:
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox
val x = df_grouped.select(callUDF("percentile_approx",col("x"),lit(0.05))).head.getDouble(0)
println(x)
When I execute the above code, I get the following:
org.apache.spark.sql.AnalysisException: undefined function percentile_approx;
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonf
If there is a better way to calculate the percentile value, that would also be appreciated.