-1

I want to compare two lists that contain class Version(object) objects to merge one into another but skip the duplicates but python seems to think two Version() objects are the same even though their content is not.

I tried giving the object a custom "comparison" method as instructed on https://stackoverflow.com/a/1227325/10881866

This is the class I am trying to compare:

class Version(object):
    valid_version = False
    version = None
    valid_platform = False
    platform = None
    valid_sign = False
    sign = None
    def __init__(self, version, platform, sign):
        version_match = search(version_pattern, version)
        if (version_match): self.version = version_match.string; self.valid_version = True
        else: self.version = version
        self.platform = platform
        self.valid_platform = platform in platforms
        sign_match = search(sign_pattern, sign)
        if (sign_match): self.sign = sign_match.string; self.valid_sign = True
        else: self.sign = sign
    def __str__(self): return str(self.__dict__)
    # def __eq__(self, other): return self.sign == other.sign

This is the helper function I used for merging (found here on SO aswell):

def merge_no_duplicates(iterable_1, iterable_2):
    myset = set(iterable_1).union(set(iterable_2))
    return list(myset)

This is the part where I merge the lists:

try:
        remote_versions = getVersionsFromRemote()
        logger.info("Loaded {} remote versions".format(len(remote_versions)))
        versions = merge_no_duplicates(versions, remote_versions)
except: logger.error("Can't load remote versions!")
try:
        local_versions = getVersionsFromLocal()
        logger.info("Loaded {} local versions".format(len(local_versions)))
        versions = merge_no_duplicates(versions, local_versions)
except: logger.error("Can't load local versions!")
versions = list(filter(None, versions))
logger.info("Got {} versions total.".format(len(versions)))

Expected:

2019-02-10 19:14:38,220|INFO    | Loaded 156 remote versions
2019-02-10 19:14:38,223|INFO    | Loaded 156 local versions
2019-02-10 19:14:38,223|INFO    | Got 156 versions total.

Actual:

2019-02-10 19:14:38,220|INFO    | Loaded 156 remote versions
2019-02-10 19:14:38,223|INFO    | Loaded 156 local versions
2019-02-10 19:14:38,223|INFO    | Got 312 versions total.
Bluescream
  • 261
  • 1
  • 8

1 Answers1

1

If you want the sets to remove duplicates, you need to define __eq__ and __hash__ methods. Here's a simple example:

class WithoutMethods:
    def __init__(self, a, b):  # Note no class-level attribute declaration
        self.a = a
        self.b = b
    def __repr__(self):
        return "WithoutMethods({0.a}, {0.b})".format(self)

class WithMethods:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return "WithMethods({0.a}, {0.b})".format(self)
    def __eq__(self, other):
        if not isinstance(other, WithMethods):
            return NotImplemented
        return (self.a, self.b) == (other.a, other.b)
    def __hash__(self):
        return hash((self.a, self.b))  # There are lots of ways to define hash methods.
                                       # This is the simplest, but may lead to collisions 

print({WithoutMethods(1, 2), WithoutMethods(1, 2)})
# {WithoutMethods(1, 2), WithoutMethods(1, 2)}
print({WithMethods(1, 2), WithMethods(1, 2)})
# {WithMethods(1, 2)}

This is due to how sets (and dicts) store their values. When you add an object to a set, the set doesn't compare it to every other object in the set to determine if it is a duplicate. Instead, it uses the hash value of the object to jump to the appropriate place in the set, and then checks the object there to see if it already has one. (This is a simplification, because sometimes unequal objects have the same hash value). Even though you have an __eq__ method, the set will never bother to compare the objects if they have different hash values.

Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96