4

I have defined a custom object with multiple fields.

For example say I have a Student object, which consists of a name, ID, and age. To compare two students and determine whether they are the same student or not, I implemented a __ eq__ method that will return whether the age, name, and ID of the two students match up.

def __eq__(self, other):
   return self.name == other.name and self.ID == other.ID and self.age == other.age

Bear in mind that the student is just an example, so the fact that student ID's tend to be unique is not considered.

Suppose I have the following enrollment lists with an arbitrary number of Student objects

[S1, S2, S3]
[S2, S3]
[S3, S5, S4]
[S1, S4, S2, S1]

I would want to create some data structure that will contain the following elements

S1, S2, S3, S4, S5

The simplest way to do this would be to initialize some data structure that can hold lots of stuff, grab an item, check whether it exists in the structure, and add it if it doesn't.

new_list = some_new_list 
for each list of students:
  for each student in the list:
     check if the student is in new_list
     #decide what to do 

If I decided to implement it as a simple list, I could potentially make a lot of comparisons as my list continues to grow, especially if I have a ridiculous amount of students and enrollment lists.

What is an efficient way of implementing this? Both for comparing two objects and then using that comparison method to generate a unique set of objects.

EDIT: so I tried a simple set implementation.

>>>a = Student("sample", 1234, 18)
>>>b = Student("sample", 1234, 18)
>>>students = set()
>>>students.add(a)
>>>b in students
False
>>>b == a
True

Am I doing something wrong?

agf
  • 171,228
  • 44
  • 289
  • 238
MxLDevs
  • 19,048
  • 36
  • 123
  • 194
  • 2
    Why not use the built-in set type? The membership test is probably more efficient than you could hope for in pure Python. – Omri Barel Aug 11 '11 at 19:35
  • @omrib, so iterating over each list of students and then calling newSet.add(student) is as good as it gets? – MxLDevs Aug 11 '11 at 19:38
  • @agf, oh, I had thought it would be enough to just check whether an item exists in the set or not. ie: "test for membership" – MxLDevs Aug 11 '11 at 19:50
  • 1
    You don't __have__ to check whether or not they're in the set. Just add them all, and the set will end up holding the unique entries. This is faster than checking individually if one is unique or not. See my edit for `__hash__` – agf Aug 11 '11 at 19:52
  • @agf, that's probably it. Though, when I implemented the hash method and went and tried the two different student objects with the same name ID and age (ie: same student), they were both added to the set. – MxLDevs Aug 11 '11 at 19:59
  • @agf, oh, it works now. They both hash to the same value and only one of them is in the set. I probably made a typo. – MxLDevs Aug 11 '11 at 20:09

3 Answers3

8
from itertools import chain
myset = set(chain(iterable1, iterable2, iterable3, iterable4))

You get unique items, and you only iterate over each iterable once. chain makes one long iterable from a series of iterables. If you need it sorted, sorted(myset) will give you a sorted list.

Your Student class needs to implement a __hash__ that is compatible with it's __eq__:

def __hash__(self):
    return (self.name, self.ID, self.age).__hash__()
agf
  • 171,228
  • 44
  • 289
  • 238
  • And if you have a variable number of student lists (all contained in a list or tuple called `student_lists`), you can use `set(chain(*student_lists))`. – Omri Barel Aug 11 '11 at 21:14
  • @omrib Use `set(chain.from_iterable(student_lists))`, so `student_lists` doesn't have to be unpacked. whoever wrote `itertools.chain` thought of that, unlike things like `map` and `zip`. – agf Aug 11 '11 at 21:21
0

set is not guaranteed to be order-preserving. If you need an order-preserving list:

import itertools
from typing import List

def unique_items(*lists: List) -> List:
    """Return an order-preserving list of unique items from the given lists.

    The implemented approach requires that the input items are hashable.

    Example: unique_items([1,9,4], [2,4,6,8,8], [3,1]) -> [1, 9, 4, 2, 6, 8, 3]

    Ref: https://stackoverflow.com/a/68626841/
    """
    return list(dict.fromkeys(itertools.chain(*lists)))
Asclepius
  • 57,944
  • 17
  • 167
  • 143
-3

I have but one word for you.

set

Here are the docs for sets

Jakob Bowyer
  • 33,878
  • 8
  • 76
  • 91