-1

I would like to know how to compare 2 different strings through a function in Python. More specifically, how similar 2 different strings are, and their similarity as a percentage (the letters that appear in both strings). Thanks in advance.

bahaaz
  • 45
  • 1
  • 3
  • 9

4 Answers4

1

You might look at difflib for various ways of comparing the strings and getting differences. Looks like difflib.Differ.compare(string1, string2) will get you an iterator which produces lines. Lines prefixed with - are in one string, lines with a blank prefix are in both strings, and lines prefixed with + are in the other string.

Morten Siebuhr
  • 6,068
  • 4
  • 31
  • 43
Pierce
  • 564
  • 2
  • 8
1
def pctSame(s1,s2):
    # Make sorted arrays of string chars
    s1c = [x for x in s1]
    s1c.sort()
    s2c = [x for x in s2]
    s2c.sort()
    i1 = 0
    i2 = 0
    same = 0
    # "merge" strings, counting matches
    while ( i1<len(s1c) and i2<len(s2c) ):
        if s1c[i1]==s2c[i2]:
            same += 2
            i1 += 1
            i2 += 1
        elif s1c[i1] < s2c[i2]:
            i1 += 1
        else:
            i2 += 1
    # Return ratio of # of matching chars to total chars
    return same/float(len(s1c)+len(s2c))
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
0

String similarity is a metric that depends on what you are measuring. Are you trying to match a mistyped word to the intended word in the dictionary? Comparing DNA or protein sequences? Trying to do document retrieval based on similarity to a search query? Doing fuzzy name matching? For each of these tasks, a different algorithm might be appropriate. If you're really asking a fully general question, you might start by reading about Levenshtein distance.

alexis
  • 48,685
  • 16
  • 101
  • 161
0

The SequenceMaster from difflib is almost what you're looking for. It hands out a score between 0 and 1, depending on how much they look like eachother.

Morten Siebuhr
  • 6,068
  • 4
  • 31
  • 43