1

Python newbie here. I have the following code to compare two strings using difflab library. The output is prefixed with '+','-' for words which are different. How to get only the differences printed without any prefix?

The expected output for the below code is

Not in first string: Nvdia

Not in first string: IBM

Not in second string: Microsoft

Not in second string: Google

Not in second string: Oracle

or just Nvdia, IBM, Microsoft, Google, Oracle

import difflib

original = "Apple Microsoft Google Oracle"
edited = "Apple Nvdia IBM"

# initiate the Differ object
d = difflib.Differ()

# calculate the difference between the two texts
diff = d.compare(original.split(), edited.split())

# output the result
print ('\n'.join(diff))

Thanks!

Bat Stock
  • 15
  • 3
  • Does this answer your question? [Python, compare two sentence by words using difflib](https://stackoverflow.com/questions/63156252/python-compare-two-sentence-by-words-using-difflib) – ades Jan 12 '22 at 21:58
  • Even better pointer: https://stackoverflow.com/a/39075165/2893408 – caram Mar 11 '23 at 00:56

1 Answers1

1

If you don't have to use difflib, you could use a set and string splitting!

>>> original = "Apple Microsoft Google Oracle"
>>> edited = "Apple Nvdia IBM"
>>> set(original.split()).symmetric_difference(set(edited.split()))
{'IBM', 'Google', 'Oracle', 'Microsoft', 'Nvdia'}

You can also get the shared members with the .intersection()

>>> set(original.split()).intersection(set(edited.split()))
{'Apple'}

The Wikipedia has a good section on basic set operations with accompanying Venn diagrams
https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations


However, if you have to use difflib (some strange environment or assignment) you can also just find every member with a +- prefix and slice off the all the prefixes

>>> diff = d.compare(original.split(), edited.split())
>>> list(a[2:] for a in diff if a.startswith(("+", "-")))
['Nvdia', 'IBM', 'Microsoft', 'Google', 'Oracle']

All of these operations result in an iterable of strings, so you can .join() 'em together or similar to get a single result as you do in your Question

>>> print("\n".join(result))
IBM
Google
Oracle
Microsoft
Nvdia
ti7
  • 16,375
  • 6
  • 40
  • 68
  • Thanks for your response. I do not have to use difflib. is there any way to get the common word in the string, in this case 'Apple'. – Bat Stock Jan 12 '22 at 22:07
  • anytime! yes, it'd be the set [intersection](https://docs.python.org/3/library/stdtypes.html#frozenset.intersection) (`&` shorthand) `set(original.split()).intersection(set(edited.split()))` `{'Apple'}` – ti7 Jan 12 '22 at 22:09
  • Fantastic thanks again!. In order to get the individual words in the set, do I have to convert it to string and extract the words or is there any easy way to achieve it? – Bat Stock Jan 12 '22 at 22:28
  • hmm.. the result is already an iterable of strings, so if if you wanted them split into lines like in your Question, you can just `.join()` 'em all together `"\n".join(resulting_set)` or any other similar mechanism – ti7 Jan 12 '22 at 22:31
  • I updated my Answer with some more information about this, though often it's useful to have a `set` or `list` of values, or create a generator of them (as `difflib.Differ()` [yields](https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do)) rather than converting back to a string immediately (and only doing so when you want some final output or information or are debugging) – ti7 Jan 12 '22 at 22:37
  • Awesome.. thanks a lot for your time and explanation. – Bat Stock Jan 13 '22 at 07:42