I try to find a way to calculate percentile 0.25, 0.75 on the Data Frame with float numbers
sqlContext.sql("SELECT percentile(x, 0.5) FROM df")
as far as I understood from the error that I got, the percentile supports only integer
AnalysisException: u'No handler for Hive udf class org.apache.hadoop.hive.ql.udf.UDAFPercentile because: No matching method for class org.apache.hadoop.hive.ql.udf.UDAFPercentile with (float, double). Possible choices: _FUNC_(bigint, array<double>) _FUNC_(bigint, double) .; line 1 pos 43'
or I need to use
sqlContext.sql("SELECT percentile_approx(x, 0.5) FROM df")
or use casting
cast(x as bigint)
the both give not the same results, of cause, as I get if calculate the percentile by the pandas on the same float values.
How can I get percentile on Spark 1.6 on the float numbers?
One workaround that I think to multiply the column on any big number (for instans 10000000) and calculate as integer.
Any othre possible solutions or workarounds?
Thanks!