I'm working on a project that requires me to check if string1 is almost present in string2, if yes (i.e. if it matches more than some threshold ration say delta), then I need to extract that matched segment from string2 and save it.
string1 will range from 100 to 200 characters string2 will be of a larger length ranging anywhere between 15000 to 20000 characters.
examples which I am presently using
string1 = "MA A NA E LA OO KA A SA A BHA I YA A BA A HA U MA A DA A DA A A NGA GA I KA AA RA A PA A DDA A DA A NA A NA TA A RA A BA MA A SA U DA EE GA AA JA A SA A BHA E GA E BA A NA DA I TA U"
string2 = string2
I've used fuzzywuzzy and SequenceMatcher libraries in python, but I'm afraid I'm just able to get the threshold value using these, but not able to extract the substring from string2.
from fuzzywuzzy import fuzz
print(fuzz.partial_ratio(string1,string2))
After performing a fuzzywuzzy partialratio check on the two strings, I'm getting a ratio of 89.
I need to get a (approximate) substring from string2 which should almost be the same length of string1. Meaning, I need that 89% matched location of the string in string2.