0

I need some help figuring out how to print the difference between two strings. I need the output to print the exact character difference between two strings, sometimes more than one character. Example of strings:

str1 = "0074574-1"
str2 = "0074574+1"

The difference here is the "+" at the place of "-". Another example is

str1 = "27785-74-1"
str2 = "27785%F274-1"

In this example the "-" is replaced by the characters "%F2". I need a code with the output from searching the difference between str1 and str2 as:

diff_between_str1_str2 = "-"

or

diff_between_str1_str2 = "%F2"

Can someone please point me into the right direction as to where I can begin to look?

Olvin Roght
  • 7,677
  • 2
  • 16
  • 35
MRHT
  • 35
  • 4
  • 1
    There's `difflib` and specifically [`difflib.ndiff()`](https://docs.python.org/3/library/difflib.html#difflib.ndiff) – Olvin Roght Jul 02 '21 at 18:09
  • 1
    Unless you're doing a character-by-character comparison of two equal-length strings, which your example show you aren't, this is a big topic that involves the concept of [edit distance](https://en.wikipedia.org/wiki/Edit_distance). There is no single, perfect answer to this, so you may want to do some research. There are libraries which can help with this. – Tom Karzes Jul 02 '21 at 18:12
  • 1
    This sounds like a variation of the Levenshtein difference algorithm. Here is a good post I've found on it: https://blog.paperspace.com/implementing-levenshtein-distance-word-autocomplete-autocorrect/#:~:text=The%20Levenshtein%20distance%20is%20a,transform%20one%20word%20into%20another. – bpgeck Jul 02 '21 at 18:12

1 Answers1

0

You can apply difflib.ndiff():

from difflib import ndiff

str1 = "27785-74-1"
str2 = "27785%F274-1"
diff_between_str1_str2 = "".join(s[-1] for s in ndiff(str2, str1) if s[0] == "+")

Internally ndiff() uses difflib.SequenceMatcher which based on Gestalt Pattern Matching.

Olvin Roght
  • 7,677
  • 2
  • 16
  • 35