I've read the Spark section on Locality Sensitive Hashing and still don't understand some of it:
https://spark.apache.org/docs/latest/ml-features.html#locality-sensitive-hashing
And there's Bucketed Random Projection example for two DataFrames. I have one simple, spatial Dataset of points, like:
(Of course later I will have millions of points) and DataFrame looks like:
X Y
id
1 11.6133 48.1075
2 11.6142 48.1066
3 11.6108 48.1061
4 11.6207 48.1192
5 11.6221 48.1223
6 11.5969 48.1276
7 11.5995 48.1258
8 11.6127 48.1066
9 11.6430 48.1275
10 11.6368 48.1278
11 11.5930 48.1156
My question is: How to put points which are close to each other to same groups, so my original DataFrame would have additional column with this hashes / groups?
Best, Marcin