2

So I have a list of values:

alist = list()

And I want check to see if the members of the list are in a dict:

ahash = dict() #imagine I have filled a dictionary with data. 

for member in alist: 
     if member in hash:
        #DO STUFF 

This is very simple.

However what I would like to do is redefine IN to implement a fuzzy comparison. So what I would like to do is match things like FOOBARBAZZ with a * such that FOO* matches FOOBARBAZZ.

The most straightforward way I can think to do this implement this whole situation as a method in an object, and then overload the IN operator. However for my own reasons (completely pedantic) I would like avoid the OOP approach.

Without looping over the entire Dictionary for every comparison (which sounds wrong!) how can I implement my custom comparison for Dictionaries?

Extra: Does the IN operator have a different name besides IN? The naming makes information on the operator difficult to research in search engines. I think it might be the same as __contains__ but I have yet to come across how __contains__ works for Dictionaries.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
MrSynAckSter
  • 1,681
  • 1
  • 18
  • 34
  • `ahash = hash()` is an incorrect syntax! the `hash()` function doesn't create a dictionary, it will return the hash value of it's input argument (if it is hashable). – Mazdak Apr 14 '16 at 16:11
  • I think you understood what I meant. – MrSynAckSter Apr 14 '16 at 16:14
  • You're losing a lot of the efficiency of a dict this way because you're going back to O(n) search solutions (getting away from the hash table). What if something matches multiple keys? Return a list? First one? – Two-Bit Alchemist Apr 14 '16 at 16:24
  • Also, in your question you talk about "overloading IN" and hint through that and your title that you don't want to do that for some unspecified reasons. Do you mean you don't want to use a subclass with a custom `__contains__`? Because that is not only the only sense in which you could "overload" `in`, it's also one of the only available solutions to your problem unless you don't want to use `in` at all. – Two-Bit Alchemist Apr 14 '16 at 16:27
  • In my use case returning the first one works. I am all ears for more efficient solutions to the issue. For my use case doing exact matches isn't going to work, because I may only have FOOBAR* and not the FOOBARBAZ for the 1:1 match. It could be that there's no efficient way to do this - maybe the best thing would be for me to flatten the data into a list or something (the data I have is already in a Dict.) – MrSynAckSter Apr 14 '16 at 16:30
  • What about non-string keys? – Two-Bit Alchemist Apr 14 '16 at 16:32
  • @Two-BitAlchemist I admit in the question that I am being pedantic. I prefer to avoid OOP constructs, and have already found the OOP solution. We don't have to use IN either, I just don't want to write a solution that's stupidly inefficient. Also, I'm not sure I understand why you are using sarcasm quotes on "overloading" - Because IN isn't technically an operator or a function? I admitted I am having trouble finding the documentation for "IN" in the question. If IN is a class, and I subclass, and redefine the function - isn't it still overloading? – MrSynAckSter Apr 14 '16 at 16:35
  • It's not sarcasm quotes. There's no overloading in Python period, as understood from other languages. In is not a class, it's a keyword. It 'magically' calls `obj.__contains__`. – Two-Bit Alchemist Apr 14 '16 at 16:36
  • Can the `*` be in other positions or just what is shown? Is FOO*BAZ valid? – Two-Bit Alchemist Apr 14 '16 at 16:38
  • @Two-BitAlchemist https://docs.python.org/2/reference/datamodel.html#special-method-names mentions operator overloading specifically. I see now that Python doesn't have function overloading, but I'm not 100% sure from what I'm reading that it can't be redefined on the fly somehow. Anyway I'm not attached to IN so badly as to force that solution. For my use case * can only be at the end of the query. – MrSynAckSter Apr 14 '16 at 16:42

3 Answers3

3

To override in you can subclass the built-in dict type and define a new __contains__ method (which is what in calls behind the scenes):

In [9]: class FuzzyDict(dict):
   ...:     def __contains__(self, needle):
   ...:         if '*' not in needle:
   ...:             return super(FuzzyDict, self).__contains__(needle)
   ...:         else:
   ...:             for key in self.keys():
   ...:                 if str(key).startswith(needle[:-1]):
   ...:                     return True
   ...:             return False
   ...:

This acts like a dict in most ways.

In [12]: my_dict = FuzzyDict(zip('abcde', range(1, 6)))

In [13]: my_dict
Out[13]: {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

In [14]: my_dict['apple'] = 6

Until you start using in tests:

In [15]: 'a' in my_dict
Out[15]: True

In [16]: 'a*' in my_dict
Out[16]: True

In [17]: 'ap*' in my_dict
Out[17]: True

In [18]: 'b*' in my_dict
Out[18]: True

In [19]: 'bi*' in my_dict
Out[19]: False

This is based on what I see in your post. If you need to support more than foo* then obviously the startswith test will not be sufficient and you may even have to use regexes. This also only overrides in -- if you want key access like my_dict['FOO*'] you'll need to also override __getitem__ and friends.

I do not see a way that this could be done in less than O(n) based on your requirements. The only reason dicts are O(1) access time is because of hashing, and you can't get the hash without the whole key.

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
1

The best way to answer this would be to translate anything in alist that you want to 'fuzzy match' into an regular expression. Then you can apply your regular expression to dict.keys(), examples might be here:

How to use re match objects in a list comprehension

Is there a formal language already defined for your fuzzy matches, or are you making one up? Turning "foo*" into an re could be done by

regex = re.sub("\*", ".*", list_element) + "$"

If the trailing '*' is the only symbol you are using for matches then your solution would be:

for member in alist:
   regex = re.sub("\*", ".*", member) + "$"
   if any([re.match(regex, x) for x in hash.keys()]):
     # do stuff

If you want to make your matching language more powerful you'll just have to make your translation into a regex more complicated.

Community
  • 1
  • 1
kingledion
  • 2,263
  • 3
  • 25
  • 39
1

There are at least two ways to accomplish your goal. In Example A, a quick query is run to determine if your member is part of your hash. It stops as soon as a match is found. On the other hand, Example B may prove to be more useful since all matching values are returned. This allows you to process that part of the hash dealing with your member without having to run another query.

#! /usr/bin/env python3


def main():
    """Demonstrate the usage of dict_contains and dict_search."""
    my_list = ['ist', 'out', 'ear', 'loopy']
    my_hash = {'a': 50, 'across': 14, 'ahash': 12, 'alist': 31, 'an': 73,
               'and': 11, 'are': 2, 'as': 34, 'avoid': 82, 'be': 3,
               'besides': 49, 'but': 45, 'can': 32, 'check': 51, 'come': 84,
               'comparison': 40, 'custom': 61, 'dictionary': 58,
               'different': 76, 'difficult': 85, 'do': 86, 'does': 13,
               'entire': 37, 'every': 33, 'filled': 77, 'foobarbazz': 20,
               'for': 42, 'fuzzy': 53, 'have': 30, 'how': 36, 'however': 68,
               'i': 74, 'if': 43, 'implement': 62, 'in': 57, 'information': 46,
               'is': 71, 'it': 83, 'like': 64, 'list': 55, 'looping': 70,
               'makes': 63, 'match': 16, 'matches': 1, 'member': 29,
               'members': 78, 'method': 7, 'might': 6, 'most': 28, 'my': 38,
               'name': 18, 'naming': 41, 'of': 52, 'on': 17, 'oop': 35,
               'operator': 21, 'over': 19, 'overload': 27, 'own': 72,
               'reasons': 79, 'redefine': 10, 'research': 22, 'same': 48,
               'search': 75, 'see': 5, 'situation': 39, 'so': 87, 'sounds': 24,
               'straightforward': 69, 'stuff': 15, 'such': 66, 'that': 47,
               'the': 56, 'then': 54, 'things': 81, 'think': 67, 'this': 59,
               'to': 9, 'very': 0, 'want': 23, 'way': 60, 'what': 44,
               'whole': 26, 'with': 8, 'without': 65, 'works': 4, 'would': 25,
               'yet': 80}
    # Example A
    for member in my_list:
        if dict_contains(my_hash, member):
            print('Found:', member)
    # Example B
    for member in my_list:
        match = dict_search(my_hash, member)
        if match:
            print('Query with', member, 'resulted in', match)
        else:
            print('Searching with', member, 'failed miserably')


def dict_contains(self, needle):
    """Check if search term can be found in any key of the given dict."""
    return any(needle in haystack for haystack in self)


def dict_search(self, pattern):
    """Return the dict's subset where the search term is found in the key."""
    return {key: value for key, value in self.items() if pattern in key}


if __name__ == '__main__':
    main()
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117