I have a large grouped dataset in spark that I need to return the percentiles from 0.01 to 0.99 for.
I have been using online resources to determine different methods of doing this, from operations on RDD: How to compute percentiles in Apache Spark
To SQLContext functionality: Calculate quantile on grouped data in spark Dataframe My question is does anyone have any opinion on what the most efficient approach is?
Also as a bonus, in SQLContext there is functions for both percentile_approx and percentile. There isn't much documentation available online for 'percentile' is this just a non-approximated 'percentile_approx' function?