I have the following input for my problem statement:-
ID -> List of Words
(101 -> Array("a1","b2","c4","d2"))
(102 -> Array("a6","b1","c5","d3"))
(103 -> Array("a1","b4","c4","d2"))
(104 -> Array("a2","b2","c3","d2"))
(105 -> Array("a7","b6","c1","d3"))
Now, I want to find out the similarity between these input statements.
Example:-
(101 -> Array("a1","b2","c4","d2"))
(103 -> Array("a1","b4","c4","d2"))
(104 -> Array("a2","b2","c3",",d2"))
In Example output the statements are much similar to each other.
How can I achieve this Using Spark? I can use any logical code or any machine learning Algorithm.
Thanks