Determine if two strings differ by at most n characters

Question

I am trying to figure out the process in how to approach this question.

Given a list of strings (los), a string s, and an integer bound n, the function should produce True if all the strings in los are at most n characters away from s, and False otherwise.

For example,

Input: (["ccat","hpat","ppat"], "ppat", 2)

Output: True

and for a False case,

Input: (["ccat","hpat","that"], "ppat", 1)

Output: False

Thanks!

You might be looking for [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) — Peri, Jul 08 '20 at 17:45
@Sania This post is useful in this regard https://stackoverflow.com/questions/402504/how-to-determine-a-python-variables-type — nice_dev, Jul 08 '20 at 19:31

score 3 · Answer 1 · answered Jul 08 '20 at 17:45

Part of your solution is called "Levenshtein distance". You can read about more here

You can either implement it yourself or you can find python solution here

after implementing this, you can individually check the distance if it is within N with a loop

nice_dev · Accepted Answer · 2020-07-08T19:11:07.577

As you clarified, you want to find difference between two strings by characters with their positions, you can do the following:

Iterate over your list one by one.
For each word in the list, compare it with the supplied string character by character.
If a character from one doesn't match with the other, increment the diff counter.
If difference becomes greater than needed difference, return False, else return True.
Keep in mind that 2 strings in comparison can be of different lengths. Take that into account too while calculating them.

Snippet:

def isSame(str_list,s,diff_offset):
    s_len = len(s)
    for each_s in str_list:
        diff = abs(len(each_s) - s_len) # initialize with difference of string length
        length = min(len(each_s),s_len)
        for i in range(0,length):
            if s[i] != each_s[i]:
                diff = diff + 1
            if diff > diff_offset:
                return False # if difference is more than required, then return false
    return True


print(isSame(["ccat","hpat","ppat"], "ppat", 2))
print(isSame(["ccat","hpat","ppatru"], "ppat", 2))
print(isSame(["ccat","hpat","that"], "ppat", 1))

Update:

As per the problem statement, if lengths are not same, just return False before beginning to compare strings char by char.

def isSame(str_list,s,diff_offset):
    s_len = len(s)
    for each_s in str_list:
        if len(each_s) != s_len:
            return False
        diff = 0
        for i in range(0,s_len):
            if s[i] != each_s[i]:
                diff = diff + 1
            if diff > diff_offset:
                return False # if difference is more than required, then return false
    return True


print(isSame(["ccat","hpat","ppat"], "ppat", 2))
print(isSame(["ccat","hpat","ppatru"], "ppat", 2))
print(isSame(["ccat","hpat","that"], "ppat", 1))

score 0 · Answer 3 · answered Jul 08 '20 at 17:56

I would use Levenshtein distance to solve it. there is a function named "distance" in the Levenshtein module:

from Levenshtein import distance

the function gets 2 string and return distance between them. which is what are you looking for.

then I would implement it on all the items of the list (comprehension, loop etc..) and check if all true.

Determine if two strings differ by at most n characters

3 Answers3