10

Let's say I have two objects of a same class: objA and objB. Their relationship is the following:

(objA == objB)    #true
(objA is objB)    #false

If I use both objects as keys in a Python dict, then they will be considered as the same key, and overwrite each other. Is there a way to override the dict comparator to use the is comparison instead of == so that the two objects will be seen as different keys in the dict?

Maybe I can override the equals method in the class or something? To be more specific, I am talking about two Tag objects from the BeautifulSoup4 library.

Here's a more specific example of what I am talking about:

from bs4 import BeautifulSoup

HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"

HTML_soup = BeautifulSoup(HTML_string, 'lxml')

first_h1 = HTML_soup.find_all('h1')[0]      #first_h1 = <h1>some_header</h1>
second_h1 = HTML_soup.find_all('h1')[1]     #second_h1 = <h1>some_header</h1>

print(first_h1 == second_h1)        # this prints True
print(first_h1 is second_h1)        # this prints False

my_dict = {}
my_dict[first_h1] = 1
my_dict[second_h1] = 1

print(len(my_dict))                 # my dict has only 1 entry!

# I want to have 2 entries in my_dict: one for key 'first_h1', one for key 'second_h1'.
Ajean
  • 5,528
  • 14
  • 46
  • 69
David Simka
  • 556
  • 1
  • 5
  • 14

2 Answers2

8

first_h1 and second_h1 are Tag class instances. When you do my_dict[first_h1] or my_dict[second_h1], string representations of the tags are used for hashing. The problem is, both of these Tag instances have the same string representations:

<h1>some_header</h1>

This is because Tag class have __hash__() magic method defined as follows:

def __hash__(self):
    return str(self).__hash__()

One of the workarounds could be to use the id() values as hashes, but the there is a problem of redefining the Tag classes inside BeautifulSoup itself. You can workaround that problem by making your own custom "tag wrapper":

class TagWrapper:
    def __init__(self, tag):
        self.tag = tag

    def __hash__(self):
        return id(self.tag)

    def __str__(self):
        return str(self.tag)

    def __repr__(self):
        return str(self.tag)

Then, you'll be able to do:

In [1]: from bs4 import BeautifulSoup
   ...: 

In [2]: class TagWrapper:
   ...:     def __init__(self, tag):
   ...:         self.tag = tag
   ...: 
   ...:     def __hash__(self):
   ...:         return id(self.tag)
   ...: 
   ...:     def __str__(self):
   ...:         return str(self.tag)
   ...: 
   ...:     def __repr__(self):
   ...:         return str(self.tag)
   ...:     

In [3]: HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"
   ...: 
   ...: HTML_soup = BeautifulSoup(HTML_string, 'lxml')
   ...: 

In [4]: first_h1 = HTML_soup.find_all('h1')[0]      #first_h1 = <h1>some_header</h1>
   ...: second_h1 = HTML_soup.find_all('h1')[1]     #second_h1 = <h1>some_header</h1>
   ...: 

In [5]: my_dict = {}
   ...: my_dict[TagWrapper(first_h1)] = 1
   ...: my_dict[TagWrapper(second_h1)] = 1
   ...: 
   ...: print(my_dict)
   ...: 
{<h1>some_header</h1>: 1, <h1>some_header</h1>: 1}

It is, though, not pretty and not quite convenient to use. I would reiterate over your initial problem and check if you actually need to put tags into a dictionary.

You can also monkey-patch bs4 using Python's introspection powers, like it was done here, but this is going to be entering a rather dangerous territory.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 1
    You are missing a __eq__ method in the wrapper class like so: def __eq__(self, other): return id(self.tag) But thanks anyway, I got my code working because of your answer! – David Simka Jun 16 '17 at 23:01
2

It seems you want to override the operator ==, you can choose the option of building a new class and implement the operator ==:

def  __eq__(self,  obj) :
      return (self is obj) 
Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144
Gefen Morami
  • 313
  • 1
  • 5