1

is there a way to calculate KDE of every column of a DataFrame?

I have a DataFrame where each column represents the values of one feature. The KDE function of Spark MLLib needs an RDD[Double] of the sample values. The problem is I need to find a way without collecting the values for each column, because that would slow down the program to much.

Does anyone have an idea how I could solve that? Sadly all my tries failed till now.

Markus Wilhelm
  • 171
  • 2
  • 2
  • 11

1 Answers1

0

Probably you can create a new RDD using sample function (refer here) and then perform your operation to get the optimal performance.

H Roy
  • 597
  • 5
  • 10