1

I have two sets with strings, and I want to be able to compare set1 elements with set2 elements, and output a sum of the matching elements. If I can avoid a loop with this, that would be preferred as well. The idea is like this:

   set1 = ['some','words','are','here']
   set2 = ['some','words','are','over','here','too']

The function I'm looking for would output a 4 here - returning True for all elements in set1 contained in set2. A likewise function in R would be

   sum(set1 %in% set2)

But I can't find an equivalent in Python. Let me know if any of you guys can help. Cheers

fattmagan
  • 51
  • 7
  • `print(len(set(set1).intersection(set(set2))))` are you looking for this? – arshovon Oct 11 '17 at 05:30
  • You have lists there, not sets. Can either of those lists contain duplicated strings? – PM 2Ring Oct 11 '17 at 05:31
  • YES! Thank you so much. I tried searching this so hard and couldn't find anything – fattmagan Oct 11 '17 at 05:32
  • https://stackoverflow.com/a/642919/365102 – Mateen Ulhaq Oct 11 '17 at 05:33
  • @PM2Ring yeah the terminology with Python has been hard for me in coming over from R, sorry about that. Yes there can be duplicates, and I want those to count in the final sum – fattmagan Oct 11 '17 at 05:33
  • Ok, how do you want to handle those dupes? Python sets (like the sets in mathematics) cannot contain dupes. It would help if you gave some example lists containing dupes, and show what output you expect from them. – PM 2Ring Oct 11 '17 at 05:36
  • @fattmagan were you actually using a set library in R? I don't recall R providing built-in set objects. I believe you were using *vectors* in R. – juanpa.arrivillaga Oct 11 '17 at 05:41
  • @juanpa.arrivillaga that is what I meant. The terminology in Python has been messing with me with its sets and tuples and lists. Yes, I want to compare two _lists_ in Python, akin to two _vectors_ in R. – fattmagan Oct 11 '17 at 05:56

1 Answers1

2

First, you do not have a set objects, you have list objects:

>>> set1 = ['some','words','are','here']
>>> set2 = ['some','words','are','over','here','too']
>>> type(set1), type(set2)
(<class 'list'>, <class 'list'>)
>>>

Python supports set-literals which look like with curly braces:

>>> set1 = {'some','words','are','here'}
>>> set2 = {'some','words','are','over','here','too'}
>>> type(set1), type(set2)
(<class 'set'>, <class 'set'>)

Python set objects overload the bitwise operators to perform set-operations. You want the number of elements in the set intersection, so use the bit-wise and operator:

>>> set1 & set2
{'are', 'here', 'words', 'some'}
>>> len(set1 & set2)
4

Alternatively, you can use a more object-oriented style:

>>> set1.intersection(set2)
{'are', 'here', 'words', 'some'}
>>> len(set1.intersection(set2))
4

I prefer the operators, personally:

>>> set1 & set2 # intersection
{'are', 'here', 'words', 'some'}
>>> set1 | set2 # union
{'some', 'here', 'words', 'too', 'over', 'are'}
>>> set1 - set2 # difference
set()
>>> set2 - set1 # difference
{'too', 'over'}
>>> set2 ^ set1 # symmetric difference
{'over', 'too'}

If you have list objects, just convert to a set:

>>> l1 = ['some','words','are','here']
>>> l2 = ['some','words','are','over','here','too']
>>> set(l1).intersection(l2)
{'some', 'are', 'words', 'here'}
>>> len(set(l1).intersection(l2))
4
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172