Calculating Kernel Density of every column in a Spark DataFrame

Question

is there a way to calculate KDE of every column of a DataFrame?

I have a DataFrame where each column represents the values of one feature. The KDE function of Spark MLLib needs an RDD[Double] of the sample values. The problem is I need to find a way without collecting the values for each column, because that would slow down the program to much.

Does anyone have an idea how I could solve that? Sadly all my tries failed till now.

score 0 · Answer 1 · answered Nov 30 '18 at 08:33

0

Probably you can create a new RDD using sample function (refer here) and then perform your operation to get the optimal performance.

answered Nov 30 '18 at 08:33

H Roy

597
5
10

Calculating Kernel Density of every column in a Spark DataFrame

1 Answers1