I have a csv
file contains some data, I want select the similar data with an input.
my data is like:
H1 | H2 | H3
--------+---------+----------
A | 1 | 7
B | 5 | 3
C | 7 | 2
And the data point that I want find data similar to that in my csv
is like : [6, 8]
.
Actually I want find rows that H2
and H3
of data set is similar to input, and It return H1
.
I want use pyspark and some similarity measure like Euclidean Distance, Manhattan Distance, Cosine Similarity or machine learning algorithm.