-2

Purpose of this was to predict outcomes from list slicing and comparisons in a more complicated project with user defined objects. I thought that the effect if not the purpose of overriding the hash function was to influence those outcomes but it does not do that here and as done here it is not clear how it could. If eq is overriden then the overriden hash function has to be there but it can return 'rhubarb' and still not affect the outcomes here. Since the comparison is being done by eq what is the purpose of the hash function and in what way is its return actually used?

class Myobj:
    def __init__(self,name,suffix='xx', age=21):
        self.name=name
        self.age=age
        self.suffix=suffix
        self.handle=self.name +self.suffix
    def __eq__(self,other):     
        return self.name==other.name and  self.age==other.age   #returns bool
    def __hash__(self):     
        return hash(self.suffix)    #or any or all of name,age,suffix or anything - no difference
    def __repr__(self):
        return self.name
    def __str__(self):
        return f'{self.handle}'

a=Myobj('one')
b=Myobj('two',suffix='yy')
c=Myobj('three')
d=Myobj('four')
e=Myobj('one',age=10)
g=Myobj('one',suffix='yy')

 #with __eq__ and __hash__ overriden
print([a,b,c,d,e])      #[one, two, three, four, one]
print(a,b)              #onexx twoyy
print()
print(f' a=c? {a==c}')  #returns False, names are not=
print(f' a=e? {a==e}')  #returns False, ages  are not=
print(f' a=g? {a==g}')  #returns True, names=, ages= but self.suffix!=other.suffix
print(hash(a),hash(g))  #791158507 -1150071058
print(hash(a.suffix))   #791158507
print(hash('xx'))       #791158507
print(Myobj.__hash__(a)) #791158507
print(set([a,b,c,d,e,g]))   #{one, one, two, one, four, three}

# now with default hash and eq dunders
# print([a,b,c,d,e])        #[one, two, three, four, one]
# print(a,b)                #onexx twoyy
# print()
# print(f' a=c? {a==c}')    #returns False
# print(f' a=e? {a==e}')    #returns False
# print(f' a=g? {a==g}')    #returns False
# print(hash(a),hash(g))    #1463830 1463857
# print(hash(a.suffix))     #-819204916
# print(hash('xx'))         #-819204916
# print(Myobj.__hash__(a))  #1463830
ragim_w
  • 1
  • 2

1 Answers1

0

EDIT: I guess you're missing the point that the custom hash function won't change the output of your program, but it may impact performance. Consider this hash function:

    def __hash__(self):     
        return 0

This is the worst case scenario. All objects return the same hash value. This will hurt performance, but besides that everything will work fine.

In [1]: class A: 
   ...:     def __hash__(self): 
   ...:         return 0 
   ...:                                                                                                                                                                                               

In [6]: huge_dict = {A():1 for _ in range(10_000)}                                                                                                                                                    

In [7]: a = A()                                                                                                                                                                                       

In [8]: huge_dict[a] = 5                                                                                                                                                                              

In [9]: %timeit huge_dict[a]                                                                                                                                                                          
211 µs ± 50.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [10]: class B(): 
    ...:     pass 
    ...:                                                                                                                                                                                              

In [11]: huge_dict_better_hash =  {B():1 for _ in range(10_000)}                                                                                                                                      

In [12]: b = B()                                                                                                                                                                                      

In [13]: huge_dict_better_hash[b] = 5                                                                                                                                                                 

In [14]: %timeit huge_dict_better_hash[b]                                                                                                                                                             
42.7 ns ± 1.43 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [20]: f"better hash was {(211*10**-6)/(42.7*10**-9)} times faster"                                                                                                                                 
Out[20]: 'better hash was 4941.451990632318 times faster

Since the comparison is being done by eq what is the purpose of the hash function and in what way is its return actually used?

This is kind of fundamental question why do we need hashing in the first place.

Learning how are dicts and sets (hash tables) implemented will help you to understand hashes

Here's how to implement __hash__ the right way for a custom class

RafalS
  • 5,834
  • 1
  • 20
  • 25
  • Wow. After days of reading up on the subject, trials and error and quite a bit of effort in composing my question, it is closed without reference to it within an hour. RafalS, thanks for yours, yes I had read two of your three. – ragim_w May 03 '20 at 15:33
  • I edited the answer, let me know if it answered your question. – RafalS May 03 '20 at 15:43