I have list of strings and a strings that look like this :
mylist = ["the yam is sweet", "what is the best time to come", "who ate my food", "no empty food on the table", "what can I do to make you happy"] # about 20k data
myString1 = "Is yam a food" # String can be longer than this
myString2 = "should I give you a food"
myString3 = "I am not happy"
I want to compare each of the myString to each string in my list and collect the percentage of similarity in three different lists. So the end result will look like this:
similar_string1 = [70, 0.5, 50, 55, 2]
similar_string2 = [50, 0.5, 70, 85, 2]
similar_string3 = [20, 15, 0, 5, 80]
So mystring1 will be compare to each string in mylist and calculate the percentage similarity. Same with myString2 and myString3. Then collect each of those percentage in a list as seen above.
I read that one can use TF-IDF to vectorize mylist and mystring, then use cosine similarity to compare them, but I never work on something like this before and I will love if anyone has an idea, process or code that will help me get started.
Thanks