I am trying to calculate a weighted mean in Sparklyr, but it doesn't seem the weighted.median
function in R is compatible with Sparklyr. I tried pulling it out of sparklyr with a collect()
command and then doing a weighted median using regular r, but it hangs then crashes with a out of memory error because the data is so huge it actually requires Sparklyr in a distributed Hadoop environment. I tried reducing the number of columns and rows down to the bare minimum, but I just can't figure out how to get a weighted median IN Sparklyr without moving it out of Sparklyr.
I don't have any code to show because my approach of pulling it out of Spark and putting it into regular R on a single server crashes when the full data is pulled. It's not a viable approach to pull it from Sparklyr to regular R.