1

I'm using Python 3.7. How can I build a set(), whose elements would be strings, that is case-insensitive? That is, if I tried these operations ...

s = caseInsensitiveSet()
s.add("ABC")
s.add("abc")

The result would be a set of size one with the lone element "ABC".

satish
  • 703
  • 5
  • 23
  • 52
  • You would then lose the state of their original form. – satish Apr 01 '20 at 22:56
  • 1
    Please clarify that requirement in the question. – Carcigenicate Apr 01 '20 at 22:57
  • Do you actually want a set where `"abc" in s` and `"ABC" in s` are both true, regardless of whether `"abc"` or `"ABC"` is the actual element in the set? – chepner Apr 01 '20 at 23:02
  • 1
    [How to get Case Insensitive Python SET](https://stackoverflow.com/q/27531211/674039) – wim Apr 01 '20 at 23:11
  • @wim, Thank you although the solution you posted seems like it only filters items after they are in the set -- in other words, it appears the set could contain both "ABC" and "abc", but it is the filtering that is screening the results. I would prefer at any one moment the set not contain case-insensitively identical elements. – satish Apr 02 '20 at 00:59
  • 1
    This gets tricky, because the set doesn't really have control over how two values are considered equal; that's up to the items being stored. You would need to define a new string type which ignores case when computing both hash values and when comparing two strings for equality, or define an entirely new set type that works only with strings and takes care of hashing itself. – chepner Apr 04 '20 at 01:27
  • @satish: no, values are case-folded *as you store them in the set*. When storing a value, the case-folded value is used to determine the unique value, the original as the 'display' value. Storing `'ABC'` causes the set to *display* `'ABC'`, but it stores this under the key `'abc'`, as that's the canonical case-folded form. If you tried to store `'abc'` too, it *replaces* the display value, under the same key `'abc'`. It will never store *both*. In other words: you seem to have misunderstood what my implementation there actually *does*. – Martijn Pieters Apr 07 '20 at 20:51
  • @satish I think your question is now more or less answered on the question I've linked earlier. That one was not an acceptable answer at the time, otherwise I would have just closed as dupe, but the issues have since been addressed. Try it out and let us know if there is anything missing? – wim Apr 10 '20 at 19:09
  • Duplicate of https://stackoverflow.com/questions/27531211/how-to-get-case-insensitive-python-set https://stackoverflow.com/questions/53780519/how-to-make-a-python-set-case-insensitive – nCessity Apr 10 '20 at 21:56

2 Answers2

0

Just override the add method of your set.

from collections import MutableSet


class CasePreservingSet(MutableSet):
    def __init__(self, *values):
        self._values = {}
        self._fold = str.casefold
        for v in values:
            self.add(v)

    def __contains__(self, value):
        return self._fold(value) in self._values

    def __iter__(self):
        return iter(self._values.values())

    def __len__(self):
        return len(self._values)

    def add(self, value):
        self._values[self._fold(value)] = value

    def discard(self, value):
        try:
            del self._values[self._fold(value)]
        except KeyError:
            pass

ex:

In [1]: from caseinsensite import CasePreservingSet                                                                                                            

In [2]: s = CasePreservingSet()                                                                                                                                

In [3]: s.add("ABC")                                                                                                                                           

In [4]: s.add("abc")                                                                                                                                           

In [5]: list(s)                                                                                                                                                
Out[5]: ['abc']

In [6]: len(s)                                                                                                                                                 
Out[6]: 1

If you want to keep the case of the first entered element use this add method instead:

    def add(self, value):
        if self._fold(value) not in self._values:
            self._values[self._fold(value)] = value

ex:

In [1]: from caseinsensite import CasePreservingSet                                                                                                            

In [2]: s = CasePreservingSet()                                                                                                                                

In [3]: s.add("ABC")                                                                                                                                           

In [4]: s.add("aBc")                                                                                                                                           

In [5]: list(s)                                                                                                                                                
Out[5]: ['ABC']

adapted from https://stackoverflow.com/a/27531275/8135079

matt.LLVW
  • 670
  • 7
  • 16
  • Thanks for this code. Where in my project am I placing this file? – satish Apr 04 '20 at 16:32
  • https://stackoverflow.com/questions/2349991/how-to-import-other-python-files or you could just place the code at the top of the file where you need this. – matt.LLVW Apr 07 '20 at 12:21
  • 1
    This should have been a comment and not an answer. Do not duplicate entire answers from other questions. If it is a duplicate question then [vote to close](https://stackoverflow.com/help/privileges/close-questions) as such and/or leave a comment once you [earn](http://meta.stackoverflow.com/q/146472) enough [reputation](http://stackoverflow.com/help/whats-reputation). If the question is not a duplicate then [edit] the post and tailor the answer to this specific question. – Martijn Pieters Apr 07 '20 at 20:49
  • 1
    And yes, that’s my answer you copied (you only removed the Python 2 compatibility layer); presumably you found this via the link wim added in a comment to the question? To earn a bounty you’d normally would do *more* than just take the work others have already done. – Martijn Pieters Apr 07 '20 at 22:45
  • and that's what i did. i offered a way to preserve the first entered case or the last. – matt.LLVW Apr 08 '20 at 09:11
  • @matt.LLVW Unfortunately you copied the bugs too :) This code crashes with normal set operations e.g. `CasePreservingSet() | {"a"}` raises `TypeError: descriptor 'casefold' for 'str' objects doesn't apply to a 'generator' object`. – wim Apr 10 '20 at 19:11
-2

Perhaps just use a dict?

s = {}
s.setdefault("ABC".lower(), "ABC")
s.setdefault("abc".lower(), "abc")
s.values()
# equivalent to {'ABC'}

Write a wrapper class if you want and you're off to the races.

Eevee
  • 47,412
  • 11
  • 95
  • 127
  • 1
    How is this better than `s.add("ABC".upper())` and `s.add("abc".upper())`? – chepner Apr 01 '20 at 22:50
  • @chepner it remembers the first seen casing, which seemed important since the asker called it out explicitly – Eevee Apr 03 '20 at 04:01