I am looking for an algorithm, preferably in Python that would help me locate substrings, N characters long, of exisiting strings that are closest to a target string N character long.
Consider the target string, that is, say, 4 characters long, to be:
targetString -> '1111'
Assume this is the string I have available with me ( I will generate substrings of this for "best alignment" matching ):
nonEmptySubStrings -> ['110101']
Substrings of the above that are 4 characters long:
nGramsSubStrings -> ['0101', '1010', '1101']
I want to write/use a "Magic Function" that would select the string closest to targetString :
someMagicFunction -> ['1101']
Some more examples:
nonEmptySubStrings -> ['101011']
nGramsSubStrings -> ['0101', '1010', '1011']
someMagicFunction -> ['1011']
nonEmptySubStrings -> ['10101']
nGramsSubStrings -> ['0101', '1010']
someMagicFunction -> ['0101', '1010']
Is this "Magic Function" a well known substring problem?
I really want to find the min. number of changes in nonEmptySubStrings so that it would have targetString as a substring.