0

enter image description here

wondering if anyone knows how to go about doing this in python.

Is there a way to fuzzy match the first column(Fruit) against itself and build an output column like the 2nd column(Cluster)?

thank you!

  • Huge number of ways to go about this. e.g. [Levenstein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) is number of single-character changes needed to make one string into another string. Or there are [semantic similarity](https://stackoverflow.com/questions/21979970/how-to-use-word2vec-to-calculate-the-similarity-distance-by-giving-2-words) methods, which care about meaning and not letters. Before you can do this you need to decide what "similar" means. – Nick ODell Nov 30 '22 at 19:51
  • I would say the first. – learn2learn Nov 30 '22 at 20:01

0 Answers0