Case insensitive dictionary

Question

I'd like my dictionary to be case insensitive.

I have this example code:

text = "practice changing the color"

words = {'color': 'colour',
        'practice': 'practise'}

def replace(words,text):

    keys = words.keys()

    for i in keys:
        text= text.replace(i ,words[i])
    return  text

text = replace(words,text)

print text

Output = practise changing the colour

I'd like another string, "practice changing the Color", (where Color starts with a capital) to also give the same output.

I believe there is a general way to convert to lowercase using mydictionary[key.lower()] but I'm not sure how to best integrate this into my existing code. (If this would be a reasonable, simple approach anyway).

@NickT This PEP has been rejected. https://www.python.org/dev/peps/pep-0455/#rejection — user1556435, Apr 08 '16 at 15:17
I am probably missing the point, but in a simplistic way: ```words = {'color': 'colour', 'Color': 'Colour', 'practice': 'practise', 'Practice', 'Practise'}```. The most obvious problems that you are going to run into are changing partial words that do not need changing (“Technicolor”) or switching the British English noun form “practice” to the verb form “practise”. Need a proofreader, really. — typonaut, Nov 12 '22 at 15:35

score 85 · Answer 1 · edited Jan 07 '22 at 13:58

85

The currently accepted answer wouldn't work for lots of cases, so it cannot be used as a drop-in dict replacement. Some tricky points in getting a proper dict replacement:

overloading all of the methods that involve keys
properly handling non-string keys
properly handling the constructor of the class

The following should work much better:

class CaseInsensitiveDict(dict):
    @classmethod
    def _k(cls, key):
        return key.lower() if isinstance(key, basestring) else key

    def __init__(self, *args, **kwargs):
        super(CaseInsensitiveDict, self).__init__(*args, **kwargs)
        self._convert_keys()
    def __getitem__(self, key):
        return super(CaseInsensitiveDict, self).__getitem__(self.__class__._k(key))
    def __setitem__(self, key, value):
        super(CaseInsensitiveDict, self).__setitem__(self.__class__._k(key), value)
    def __delitem__(self, key):
        return super(CaseInsensitiveDict, self).__delitem__(self.__class__._k(key))
    def __contains__(self, key):
        return super(CaseInsensitiveDict, self).__contains__(self.__class__._k(key))
    def has_key(self, key):
        return super(CaseInsensitiveDict, self).has_key(self.__class__._k(key))
    def pop(self, key, *args, **kwargs):
        return super(CaseInsensitiveDict, self).pop(self.__class__._k(key), *args, **kwargs)
    def get(self, key, *args, **kwargs):
        return super(CaseInsensitiveDict, self).get(self.__class__._k(key), *args, **kwargs)
    def setdefault(self, key, *args, **kwargs):
        return super(CaseInsensitiveDict, self).setdefault(self.__class__._k(key), *args, **kwargs)
    def update(self, E={}, **F):
        super(CaseInsensitiveDict, self).update(self.__class__(E))
        super(CaseInsensitiveDict, self).update(self.__class__(**F))
    def _convert_keys(self):
        for k in list(self.keys()):
            v = super(CaseInsensitiveDict, self).pop(k)
            self.__setitem__(k, v)

edited Jan 07 '22 at 13:58

martineau

119,623
25
170
301

answered Oct 01 '15 at 13:16

m000

5,932
3
31
28

3

This is great, but there is one minor problem. The super definition of `update` is `update(self, E=None, **F)`, meaning `E` is optional. You've re-defined it to make `E` required. Add in the `=None` and this will be perfect. – Nick Williams Nov 16 '15 at 14:44
Nice catch @NickWilliams. Thanks! – m000 Nov 16 '15 at 22:58
50

Python is easy, they said. Python is fun, they said. – rr- Nov 29 '15 at 20:59
12

@rr-. To be totally fair, imagine doing this in say C. – Mad Physicist Jul 05 '16 at 14:38
1

@MadPhysicist Not sure, but it should be straightforward to add. Just modify `_k()` to also normalise as desired. – m000 Jul 05 '16 at 18:48
return super(CaseInsensitiveDict, self).setdefault(self.__class__._k(key), *args, **kwargs) TypeError: super(type, obj): obj must be an instance or subtype of type – Denny Weinberg Sep 27 '16 at 09:05
12

In python 3 the abstract type `basestring` was removed. `str` can be used as a replacement. – Jan Schatz Jan 19 '18 at 09:47
Great answer, and it still works in Python 3, as long as you change basestring to str. I have modified it slightly to support tuples. If a tuple is used as a key, it will match the tuple regardless of case of the items in the tuple (even if nested). It's too long to post as a comment. Basically, if it's a tuple, it calls a method that converts all strings in the tuple to lowercase recursively. – Troy Hoffman Dec 26 '18 at 20:08
for those who need python2/3 compatibility with respect to basestring - you can see this answer as a basis for modifying the `_k` method: https://stackoverflow.com/a/22679982 – ara.hayrabedian May 22 '19 at 12:43
what about encoding every key to its base32 representation? Will that ensure consistency across any encoding the string is in? – Itamar Jun 20 '19 at 18:01
1

Very similar to what I did here: https://stackoverflow.com/a/43457369/281545 - main (important) difference is that I retain case info, but this needs a dedicated string class – Mr_and_Mrs_D Oct 02 '20 at 13:12
ideally, a true case-insensitive dictionary would also be case-preserving of the last setter. ie: this is collation only, not data-loss – Erik Aronesty Mar 03 '21 at 22:32
In Python 3 it implementing a dictionary subclass would be a lot simpler if its base class was the abstract base class [`collections.abc.MutableMapping`](https://docs.python.org/3/library/collections.abc.html?highlight=mutablemapping#collections-abstract-base-classes) because doing so would *greatly* reduce the number of methods that needed to be implemented. – martineau Jan 07 '22 at 14:05
1

For python3 only remove the def has_key overload - the existing def __contains__ covers it. – Ricibob Sep 20 '22 at 14:41
@m000 Please excuse my Python ignorance, but why is _k defined as a class method and then all the calls use that ugly `self.__class__._k(key)` invocation? Is this trying to avoid overriding in a further subclass or something? – jschultz410 Jan 26 '23 at 14:29
1

@jschultz410 Overriding in a subclass should work fine. Going through `__class__` makes sure that the instances of the class will behave the same way. Otherwise you could redefine `_k()` per instance and have instances of CaseInsensitiveDict that don't really behave the same way. Since this is Python, you can still screw things up if you try hard enough, so this just puts the screw-up barrier a bit higher. – m000 Jan 27 '23 at 16:01

score 81 · Answer 2 · answered May 29 '13 at 15:24

81

Just for the record. I found an awesome impementation on Requests:

https://github.com/kennethreitz/requests/blob/v1.2.3/requests/structures.py#L37

answered May 29 '13 at 15:24

santiagobasulto

11,320
11
64
88

26

`from requests.structures import CaseInsensitiveDict` – JimB May 08 '16 at 20:55
15

That might work, but if what you need is just a Case Insensitive Dict, it's silly to add requests as a dependency just for that. – santiagobasulto May 09 '16 at 13:33
15

@santiagobasulto - it's "silly" until the (need-it-to-work/time-till-deadline) ratio goes to infinity – qneill Aug 16 '20 at 20:57
If you need just the CaseInsensitiveDict you are free to lift just the source-code of that class from the requests project and insert it into your project provided you retain the license properly etc. – Donal Sep 28 '22 at 19:51
Note: The structure remembers the case of the last key to be set, and ``iter(instance)``, ``keys()``, ``items()``, ``iterkeys()``, and ``iteritems()`` will contain case-sensitive keys. However, querying and contains testing is case insensitive – Hritik Jan 05 '23 at 18:50

score 48 · Accepted Answer · answered Jan 17 '10 at 18:50

48

If I understand you correctly and you want a way to key dictionaries in a non case-sensitive fashion, one way would be to subclass dict and overload the setter / getter:

class CaseInsensitiveDict(dict):
    def __setitem__(self, key, value):
        super(CaseInsensitiveDict, self).__setitem__(key.lower(), value)

    def __getitem__(self, key):
        return super(CaseInsensitiveDict, self).__getitem__(key.lower())

answered Jan 17 '10 at 18:50

jkp

78,960
28
103
104

1

Isn't there a special builtin that is called for 'in' as well? – Omnifarious Jan 17 '10 at 19:00
30

Here is a complete list of methods that may need overloading: __setitem__, __getitem__, __contains__, get, has_key, pop, setdefault, and update. __init__ and fromkeys should also possibly be overloaded to make sure the dictionary is initialized properly. Maybe I'm wrong and somewhere Python promises that get, hash_key, pop, setdefault, update and __init__ will be implemented in terms of __getitem__, __setitem__ and __contains__ if they've been overloaded, but I don't think so. – Omnifarious Jan 17 '10 at 19:08
5

added `__contains__, get, and has_key` to the answer since I ended up coding them :) – Michael Merchant Apr 08 '11 at 23:29
11

This solution is very limited as it doesn't work for a **lot** of common uses of `dict`. **Don't use it in your code - it will break all but the simplest uses.** Apparently @MichaelMerchant attempted to add the missing stuff, but moderation dissaproved the changes (same thing happened to me). I added a new answer which should be usable as a drop-in `dict` replacement [here](http://stackoverflow.com/a/32888599/277172). – m000 Oct 01 '15 at 13:26
Like the others said, setdefault as example is broken! "descriptor 'setdefault' requires a 'dict' object but received a 'str'" – Denny Weinberg Sep 27 '16 at 09:01
3

Better off subclassing `UserDict` than `dict` https://docs.python.org/3.5/library/collections.html#userdict-objects – rite2hhh Aug 30 '19 at 19:37

score 20 · Answer 4 · answered May 13 '15 at 17:26

In my particular instance, I needed a case insensitive lookup, however, I did not want to modify the original case of the key. For example:

>>> d = {}
>>> d['MyConfig'] = 'value'
>>> d['myconfig'] = 'new_value'
>>> d
{'MyConfig': 'new_value'}

You can see that the dictionary still has the original key, however it is accessible case-insensitively. Here's a simple solution:

class CaseInsensitiveKey(object):
    def __init__(self, key):
        self.key = key
    def __hash__(self):
        return hash(self.key.lower())
    def __eq__(self, other):
        return self.key.lower() == other.key.lower()
    def __str__(self):
        return self.key

The __hash__ and __eq__ overrides are required for both getting and setting entries in the dictionary. This is creating keys that hash to the same position in the dictionary if they are case-insensitively equal.

Now either create a custom dictionary that initializes a CaseInsensitiveKey using the provided key:

class CaseInsensitiveDict(dict):
    def __setitem__(self, key, value):
        key = CaseInsensitiveKey(key)
        super(CaseInsensitiveDict, self).__setitem__(key, value)
    def __getitem__(self, key):
        key = CaseInsensitiveKey(key)
        return super(CaseInsensitiveDict, self).__getitem__(key)

or simply make sure to always pass an instance of CaseInsensitiveKey as the key when using the dictionary.

Nice, thanks! :) (Note that this class doesn't implement the case-insensitive "dict(iterable)" constructor so if you need it you have to add it) — Joril, Mar 27 '18 at 09:15
You should use `.casefold()` instead of `.lower()` for comparisons, `self.key.casefold() == other.key.casefold()`, to allow `"ß"` and `"ss"` to equate as true, among others. — AJNeufeld, Sep 24 '19 at 20:00

score 14 · Answer 5 · answered Jan 17 '10 at 22:47

14

Would you consider using string.lower() on your inputs and using a fully lowercase dictionary? It's a bit of a hacky solution, but it works

answered Jan 17 '10 at 22:47

inspectorG4dget

110,290
27
149
241

It's a bit hacky, but I think it is along the lines of what Kim was after. – John Y Jan 18 '10 at 07:42
This is not hacky. In fact this is the less error-prone way than overriding dictionary class. – Saher Ahwal Oct 15 '16 at 23:24
5

This is great unless you want to preserve the original case when setting the a key the first time. – Daniel Roethlisberger Nov 26 '16 at 16:01
2

use `string.casefold()` instead – Erik Aronesty Mar 03 '21 at 22:34
1

@ErikAronesty link to docs: https://docs.python.org/library/stdtypes.html#str.casefold – Boris Verkhovskiy Jul 05 '21 at 01:44

mloskot · Answer 6 · 2018-06-28T13:14:44.427

I've modified the simple yet good solution by pleasemorebacon (thanks!) making it slightly more compact, self-contained and with minor updates to allow construction from {'a':1, 'B':2} and support __contains__ protocol. Finally, since the CaseInsensitiveDict.Key is expected to be string (what else can be case-sensitive or not), it is a good idea to derive Key class from the str, then it is possible, for instance, to dump CaseInsensitiveDict with json.dumps out of the box.

# caseinsensitivedict.py
class CaseInsensitiveDict(dict):

    class Key(str):
        def __init__(self, key):
            str.__init__(key)
        def __hash__(self):
            return hash(self.lower())
        def __eq__(self, other):
            return self.lower() == other.lower()

    def __init__(self, data=None):
        super(CaseInsensitiveDict, self).__init__()
        if data is None:
            data = {}
        for key, val in data.items():
            self[key] = val
    def __contains__(self, key):
        key = self.Key(key)
        return super(CaseInsensitiveDict, self).__contains__(key)
    def __setitem__(self, key, value):
        key = self.Key(key)
        super(CaseInsensitiveDict, self).__setitem__(key, value)
    def __getitem__(self, key):
        key = self.Key(key)
        return super(CaseInsensitiveDict, self).__getitem__(key)

Here is a basic test script for those who like to check things in action:

# test_CaseInsensitiveDict.py
import json
import unittest
from caseinsensitivedict import *

class Key(unittest.TestCase):
    def setUp(self):
        self.Key = CaseInsensitiveDict.Key
        self.lower = self.Key('a')
        self.upper = self.Key('A')

    def test_eq(self):
        self.assertEqual(self.lower, self.upper)

    def test_hash(self):
        self.assertEqual(hash(self.lower), hash(self.upper))

    def test_str(self):
        self.assertEqual(str(self.lower), 'a')
        self.assertEqual(str(self.upper), 'A')

class Dict(unittest.TestCase):
    def setUp(self):
        self.Dict = CaseInsensitiveDict
        self.d1 = self.Dict()
        self.d2 = self.Dict()
        self.d1['a'] = 1
        self.d1['B'] = 2
        self.d2['A'] = 1
        self.d2['b'] = 2

    def test_contains(self):
        self.assertIn('B', self.d1)
        d = self.Dict({'a':1, 'B':2})
        self.assertIn('b', d)

    def test_init(self):
        d = self.Dict()
        self.assertFalse(d)
        d = self.Dict({'a':1, 'B':2})
        self.assertTrue(d)

    def test_items(self):
        self.assertDictEqual(self.d1, self.d2)
        self.assertEqual(
            [v for v in self.d1.items()],
            [v for v in self.d2.items()])

    def test_json_dumps(self):
        s = json.dumps(self.d1)
        self.assertIn('a', s)
        self.assertIn('B', s)

    def test_keys(self):
        self.assertEqual(self.d1.keys(), self.d2.keys())

    def test_values(self):
        self.assertEqual(
            [v for v in self.d1.values()],
            [v for v in self.d2.values()])

You should use `.casefold()` instead of `.lower()` for comparisons, `self.casefold() == other.key.casefold()` and `hash(self.casefold())`, to allow "ß" and "ss" to equate as true, among others. — AJNeufeld, Sep 24 '19 at 20:11

Fred · Answer 7 · 2019-01-21T01:23:46.303

You can do a dict key case insensitive search with a one liner:

>>> input_dict = {'aBc':1, 'xyZ':2}
>>> search_string = 'ABC'
>>> next((value for key, value in input_dict.items() if key.lower()==search_string.lower()), None)
1
>>> search_string = 'EFG'
>>> next((value for key, value in input_dict.items() if key.lower()==search_string.lower()), None)
>>>

You can place that into a function:


def get_case_insensitive_key_value(input_dict, key):
    return next((value for dict_key, value in input_dict.items() if dict_key.lower() == key.lower()), None)

Note that only the first match is returned.

score 3 · Answer 8 · answered Jan 17 '10 at 19:17

3

While a case insensitive dictionary is a solution, and there are answers to how to achieve that, there is a possibly easier way in this case. A case insensitive search is sufficient:

import re

text = "Practice changing the Color"
words = {'color': 'colour', 'practice': 'practise'}

def replace(words,text):
        keys = words.keys()
        for i in keys:
                exp = re.compile(i, re.I)
                text = re.sub(exp, words[i], text)
        return text

text = replace(words,text)
print text

answered Jan 17 '10 at 19:17

Jakob Borg

23,685
6
47
47

3

It's far better to use the built-in string methods than the regular expression module when the built-ins can easily handle it, which they can in this case. – John Y Jan 17 '10 at 19:44
thanks calmh. I'm short on time right now, so your quick and simple solution suits me nicely. thanks – Kim Jan 17 '10 at 19:54
@John Y: What would be the regexp-less solution to this? I don't see it. – Jakob Borg Jan 17 '10 at 19:57
Kim already mentioned it: use the string.lower() method. Other answers also mentioned it. Comments are no good for posting code, so I guess I will post my own answer. – John Y Jan 18 '10 at 05:33
+1 This solution worked best for me, since in my case, the case of the dictionary key matters, and simply lowercasing the key on set is not sufficient. – yobiscus May 13 '15 at 14:22

MTKnife · Answer 9 · 2019-06-11T20:10:03.623

If you only need to do this once in your code (hence, no point to a function), the most straightforward way to deal with the problem is this:

lowercase_dict = {key.lower(): value for (key, value) in original_dict}

I'm assuming here that the dict in question isn't all that large--it might be inelegant to duplicate it, but if it's not large, it isn't going to hurt anything.

The advantage of this over @Fred's answer (though that also works) is that it produces the same result as a dict when the key isn't present: a KeyError.

score 0 · Answer 10 · answered Jan 19 '22 at 00:46

There are multiple approaches to this problem, each has its set of pros and cons. Just to add to the list (looks like this option wasn't mentioned), it's possible to extend str class and use it as a key:

class CaseInsensitiveStr(str):
    def __hash__(self) -> 'int':
        return hash(self.lower())
    def __eq__(self, other:'str') -> 'bool':
        return self.lower() == other.lower()

It can work well if dictionary in question is private and some kind of interface is used to access it.

class MyThing:
    def __init__(self):
        self._d: 'dict[CaseInsensitiveStr, int]' = dict()
    def set(self, key:'str', value:'int'):
        self._d[CaseInsensitiveStr(key)] = value
    def get(self, key:'str') -> 'int':
        return self._d[CaseInsensitiveStr(key)]

score 0 · Answer 11 · answered Mar 18 '23 at 02:26

0

Or...if you'd rather use an off-the-shelf product rather than hacking it yourself...try... https://pypi.org/project/case-insensitive-dictionary/

answered Mar 18 '23 at 02:26

John D. Aynedjian

639
1
6
4

score 0 · Answer 12 · answered Aug 31 '23 at 13:21

Credit: based on @m000's answer. The following variant provides a get_orig_key method, by keeping track of the case-sensitive key of the last "set" operation.

class RobbieCaseInsensitiveDict(dict):
    @classmethod
    def _k(cls, key):
        return key.lower() if isinstance(key, str) else key

    def __init__(self, *args, **kwargs):
        super(RobbieCaseInsensitiveDict, self).__init__(*args, **kwargs)
        self.key_dict = {}
        for key in self.keys():
            if isinstance(key, str):
                self.key_dict[key.lower()] = key
        self._convert_keys()

    def get_orig_key(self, case_ins_key):
        if case_ins_key in self.key_dict:
            return self.key_dict[case_ins_key]
        else:
            return case_ins_key

    def __getitem__(self, key):
        return super(RobbieCaseInsensitiveDict, self).__getitem__(self.__class__._k(key))

    def __setitem__(self, key, value):
        if isinstance(key, str):
            self.key_dict[key.lower()] = key
        super(RobbieCaseInsensitiveDict, self).__setitem__(self.__class__._k(key), value)

    def __delitem__(self, key):
        return super(RobbieCaseInsensitiveDict, self).__delitem__(self.__class__._k(key))

    def __contains__(self, key):
        return super(RobbieCaseInsensitiveDict, self).__contains__(self.__class__._k(key))

    def has_key(self, key):
        return super(RobbieCaseInsensitiveDict, self).has_key(self.__class__._k(key))

    def pop(self, key, *args, **kwargs):
        return super(RobbieCaseInsensitiveDict, self).pop(self.__class__._k(key), *args, **kwargs)

    def get(self, key, *args, **kwargs):
        return super(RobbieCaseInsensitiveDict, self).get(self.__class__._k(key), *args, **kwargs)

    def setdefault(self, key, *args, **kwargs):
        if isintance(key, str):
            self.key_dict[key.lower()] = key
        return super(RobbieCaseInsensitiveDict, self).setdefault(self.__class__._k(key), *args, **kwargs)

    def update(self, E={}, **F):
        super(RobbieCaseInsensitiveDict, self).update(self.__class__(E))
        super(RobbieCaseInsensitiveDict, self).update(self.__class__(**F))

    def _convert_keys(self):
        for k in list(self.keys()):
            v = super(RobbieCaseInsensitiveDict, self).pop(k)
            self.__setitem__(k, v)

score -1 · Answer 13 · answered Sep 05 '18 at 20:29

I just set up a function to handle this:

def setLCdict(d, k, v):
    k = k.lower()
    d[k] = v
    return d

myDict = {}

So instead of

myDict['A'] = 1
myDict['B'] = 2

You can:

myDict = setLCdict(myDict, 'A', 1)
myDict = setLCdict(myDict, 'B', 2)

You can then either lower case the value before looking it up or write a function to do so.

    def lookupLCdict(d, k):
        k = k.lower()
        return d[k]

    myVal = lookupLCdict(myDict, 'a')

Probably not ideal if you want to do this globally but works well if its just a subset you wish to use it for.

Case insensitive dictionary

13 Answers13

Linked

Related