2

What would be the best way to do the following case-insensitive intersection:

a1 = ['Disney', 'Fox']
a2 = ['paramount', 'fox']
a1.intersection(a2)
> ['fox']

Normally I'd do a list comprehension to convert both to all lowercased:

>>> set([_.lower() for _ in a1]).intersection(set([_.lower() for _ in a2]))
set(['fox'])

but it's a bit ugly. Is there a better way to do this?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Not really; about the best you can do is to convert everything to lower case in one pass, and keep it that way for the remainder of your processing. – Prune Oct 12 '18 at 16:20
  • https://stackoverflow.com/questions/1479979/case-insensitive-comparison-of-sets-in-python – mad_ Oct 12 '18 at 16:23
  • Similar question: [Case-insensitive comparison of sets in Python](https://stackoverflow.com/q/1479979/674039). Not a great dupe because that one is about `frozenset`, and the set literal syntax is not available. – wim Oct 12 '18 at 16:30

2 Answers2

8

Using the set comprehension syntax is slightly less ugly:

>>> {str.casefold(x) for x in a1} & {str.casefold(x) for x in a2}
{'fox'}

The algorithm is the same, and there is not any more efficient way available because the hash values of strings are case sensitive.

Using str.casefold instead of str.lower will behave more correctly for international data, and is available since Python 3.3+.

wim
  • 338,267
  • 99
  • 616
  • 750
1

There are some problems with definitions here, for example in the case that a string appears twice in the same set with two different cases, or in two different sets (which one do we keep?).

With that being said, if you don't care, and you want to perform this sort of intersections a lot of times, you can create a case invariant string object:

class StrIgnoreCase:
  def __init__(self, val):
    self.val = val

  def __eq__(self, other):
    if not isinstance(other, StrIgnoreCase):
        return False

    return self.val.lower() == other.val.lower()

  def __hash__(self):
    return hash(self.val.lower())

And then I'd just maintain both the sets so that they contain these objects instead of plain strings. It would require less conversions on each creation of new sets and each intersection operation.

Barak Itkin
  • 4,872
  • 1
  • 22
  • 29