1

Basically the title. I'm trying to store information about duplicate objects in a list of objects, but I'm having a hard time finding anything related to this. I've devised this for now, but I'm not sure if this is the best way for what I want to do :

@dataclass
class People:
    name: str = None
    age: int = None

    # Functions to check for duplicates (based on names)
    def __eq__(self, other):
        return (self.name == other.name)
    def __hash__(self):
        return hash(('name', self.name))

objects = [People("General", 12), People("Kenobi", 11), People("General", 15)]
duplicates, temp = [], {}
for (i, object) in enumerate(objects):
    if (not object.name in temp):
        temp[object.name] = {'count': 1,
                             'indices': [i]}
    else:
        temp[object.name]['count'] += 1
        temp[object.name]['indices'] += [i]
for t in temp:
    if (temp[t]['count'] > 1):
        print(f"Found duplicates of {t}")
        for i in temp[t]['indices']:
            duplicates.append(objects[i])

Edit : The People class is simple as an example. I thought about making it a dict, but that would be more complicated than keeping track of a list of objects. I'm looking to make a new list of duplicates by name only, while keeping every other attribute/value as the original object

srcLegend
  • 62
  • 1
  • 7
  • why not keep them in a dictionary to begin with? `people['general'] = [People('General'), People('General')]` – Him Jun 15 '21 at 00:49
  • add `__hash__` method to your class, that calculate hash based on instance attributes and then use it fot class instances comparation – p.konstantyn Jun 15 '21 at 00:55
  • 1
    Based on the answers below, it might be helpful to explicitly spell out the results you want. – Mark Jun 15 '21 at 00:57
  • Just a terminology note, you don't have a list of "class objects", which would mean in Python a list with *classes in it*, e.g. `[str, int, People]`. Everything is an object in Python. A class object refers to *a class*, just like a list object refers to a *list*. – juanpa.arrivillaga Jun 15 '21 at 02:16

1 Answers1

3

Use collections.Counter.

from collections import Counter

...

counts = Counter(objects)
duplicates = [o for o, c in counts.items() if c > 1]

If you want lists of objects matching certain criteria (e.g. all those with the same name), that's not really the same thing as getting a list of duplicates, but it's also very simple:

from collections import defaultdict

...

people_by_name = defaultdict(list)

for p in objects:
   people_by_name[p.name].append(p)

If you want to narrow that dictionary to only lists with more than one element, you can use a comprehension very similar to the one you'd use with the Counter:

people_by_name = {k: v for k, v in people_by_name.items() if len(v) > 1}
Samwise
  • 68,105
  • 3
  • 30
  • 44
  • 2
    I've tried this one before, but it only returns one of the multiple objects. In other words, if I want to list duplicate names of different ages of people, this wouldn't work. @Mark, I did add the "__hash__" property later on. Let me edit the OP – srcLegend Jun 15 '21 at 00:51
  • If they have different ages, they aren't duplicate objects. Sounds like you just want a dict of names to ages (e.g. make a `defaultdict(list)` and then do `d[name].append(age)` for each `name` and `age`). – Samwise Jun 15 '21 at 00:53
  • Second part of your answer is almost exactly what I want. Just need a way to remove entries that only have one sub-entry – srcLegend Jun 15 '21 at 01:10
  • To clarify, it lists duplicates under the same key (as I want) while still keeping unique entries – srcLegend Jun 15 '21 at 01:16
  • `people_by_name = {k: v for k, v in people_by_name.items() if len(v) > 1}` – Samwise Jun 15 '21 at 01:16