-3

I'm looking for a minimal solution that exhibits the same API as set, but has deterministic iteration order. That is, the iteration order should depend only on the existing elements in the set. Here is a skeleton of the solution I came up with:

class MySet(UserDict):
 
    def __contains__(self, element):
        return element in self.data
 
    def __iter__(self):
        # from a comment by @juanpa.arrivillaga
        return iter(sorted(self.data))

    def __str__(self) -> str:
        return "{" + ", ".join([str(k) for k in self._keys()]) + "}"
 
    def add(self, element):
        # arbitraily map all elements to True
        self.data[element] = True
 
    def _keys(self):
        return sorted(self.data.keys())

Can it be improved ?

OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87
  • 3
    If you want to answer your own question you can post the answer and question together. – Guy Aug 07 '22 at 10:30
  • There's always room for suggestions / improvements. this decouples the question from solution. – OrenIshShalom Aug 07 '22 at 10:37
  • 4
    Guy isn't talking about putting the answer in the question. Never do that. Guy is talking about the "Answer your own question" option on the "Ask Question" page, which lets you post an answer as you post the question. – user2357112 Aug 07 '22 at 10:44
  • 2
    The answer will be posted as a regular answer. It can be edited/deleted and other users can still post their answers. I'm guessing at least some of the downvotes are because it looks like a request for code without any effort on your part. – Guy Aug 07 '22 at 10:44
  • Oh I wasn't aware of the option - I'm actually not requesting any code - I was just missing this feature of deterministic set iteration so I though I'd share it - I guess it's still buggy - I'll fix it now. – OrenIshShalom Aug 07 '22 at 10:47
  • There's [ordered-set](https://pypi.org/project/ordered-set/) on pypi. – Aran-Fey Aug 07 '22 at 11:09
  • @Aran-Fey, this is the reason I decided to post - because I though that a complete pip package is a bit over-the-top for sub-classing `UserDict` – OrenIshShalom Aug 07 '22 at 11:11
  • @OrenIshShalom it's really not, implementing a sorted container takes some care to do so efficiently. – juanpa.arrivillaga Aug 07 '22 at 11:20
  • 1
    @Aran-Fey I think the OP actually wants a *sorted* set. In which case, the excellent [sortedcontainers](https://grantjenks.com/docs/sortedcontainers/) library is a good way to go. Nowadays, if you wanted an insertion orderd set, I would just use a `dict` will all `None` values. – juanpa.arrivillaga Aug 07 '22 at 11:22

1 Answers1

1

It's not clear from the question what precisely your requirement is, but based on your implementation it's just a set where only iteration is in sorted order (it's still not semantically ordered, e.g. you shouldn't be able to access elements positionally and comparison wouldn't consider order, and insertion order doesn't matter). In that case I'd implement the existing MutableSet abstract base class and compose around a vanilla set:

from collections.abc import MutableSet
from typing import Any, Iterable, Iterator, Protocol, TypeVar


class Sortable(Protocol):
    def __lt__(self, other) -> bool: ...


T = TypeVar("T", bound=Sortable)


class OrderedSet(MutableSet[T]):

    _values: set[T]

    def __init__(self, values: Iterable[T] = None, /) -> None:
        self._values = set() if values is None else set(values)

    def add(self, value: T) -> None:
        self._values.add(value)

    def discard(self, value: T) -> None:
        self._values.discard(value)

    def __contains__(self, item: Any) -> bool:
        return item in self._values

    def __iter__(self) -> Iterator[T]:
        return iter(sorted(self._values))

    def __len__(self) -> int:
        return len(self._values)

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}({list(self)!r})"

The only custom behaviour is the use of sorted in the __iter__ implementation. These are the tests I drove it out with:

from unittest import TestCase

from ordered_set import OrderedSet


class TestOrderedSet(TestCase):

    def test_contains(self):
        self.assertNotIn("foo", OrderedSet())
        self.assertIn("foo", OrderedSet(["foo", "bar", "baz"]))

    def test_iteration(self):
        os = OrderedSet(("foo", "bar"))
        os.add("baz")
        self.assertEqual(
            list(os),
            ["bar", "baz", "foo"]
        )

    def test_size(self):
        self.assertEqual(len(OrderedSet()), 0)
        self.assertEqual(len(OrderedSet(["foo", "bar", "baz"])), 3)

    def test_addition(self):
        os = OrderedSet()
        for value in ["foo", "bar", "baz"]:
            os.add(value)
        self.assertEqual(len(os), 3)
        self.assertIn("bar", os)

    def test_removal(self):
        os = OrderedSet(["foo", "bar", "baz"])
        os.remove("baz")
        self.assertEqual(len(os), 2)
        self.assertNotIn("baz", os)

    def test_rejects_keyword_args(self):
        with self.assertRaises(TypeError):
            OrderedSet(values=["foo", "bar", "baz"])

    def test_repr(self):
        self.assertEqual(
            repr(OrderedSet([3, 1, 2])),
            "OrderedSet([1, 2, 3])",
        )

I implement MutableSet rather than extending UserDict because:

  1. it has better semantics, what we're trying to create is a mutable set;

  2. it keeps you compliant with the Liskov substitution principle, meaning an OrderedSet can be used anywhere any other Set* or MutableSet can:

    • your implementation can be added to but lacks the removal methods: clear, discard, pop, remove;
    • your implementation also lacks the "magic" methods that allow e.g.:
      >>> OrderedSet((1, 3)) | {2, 4}
      OrderedSet([1, 2, 3, 4])
      
  3. it gives us more freedom in implementation. For example, if sorting the set for every iteration is a crucial performance bottleneck, we can keep the same tests and interface and swap over to a bisected list:

    from bisect import bisect_left
    from collections.abc import MutableSet
    from typing import Any, Iterable, Iterator, Protocol, TypeVar
    
    
    class Sortable(Protocol):
        def __lt__(self, other) -> bool: ...
    
    
    T = TypeVar("T", bound=Sortable)
    
    
    class OrderedSet(MutableSet[T]):
    
        _values: list[T]
    
        def __init__(self, values: Iterable[T] = None, /) -> None:
            self._values = [] if values is None else sorted(values)
    
        def add(self, value: T) -> None:
            index = bisect_left(self._values, value)
            if index == len(self) or self._values[index] != value:
                self._values.insert(index, value)
    
        def discard(self, value: T) -> None:
            index = bisect_left(self._values, value)
            if index < len(self) and self._values[index] == value:
                self._values.pop(index)
    
        def __contains__(self, value: Any) -> bool:
            index = bisect_left(self._values, value)
            return index < len(self) and self._values[index] == value
    
        def __iter__(self) -> Iterator[T]:
            return iter(self._values)
    
        def __len__(self) -> int:
            return len(self._values)
    
        def __repr__(self) -> str:
            return f"{self.__class__.__name__}({self._values!r})"
    
    

* but not where any set can, see e.g. Why are set methods like .intersection() not supported on set-like objects?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • Yes, the only customized behavior is sorted iteration. Is there any complexity penalty for the `UserDict` ? or other reason(s) you chose to subclass `MutableSet` ? – OrenIshShalom Aug 07 '22 at 11:45
  • @OrenIshShalom for one thing implementing `MutableSet` has better semantics - that's the kind of thing I'm actually trying to create, an `OrderedSet` _is a_ mutable set. It also gives you more choice on the implementation - e.g. if performance is critical and sorting every iteration is costly then it might be better to compose around a list, instead of a set, and [bisect](https://docs.python.org/3/library/bisect.html) it. – jonrsharpe Aug 07 '22 at 11:55
  • thanks ! one last question - what is `/)` in your `__init__` ? I thought it was a typo but the code works ... – OrenIshShalom Aug 07 '22 at 11:59
  • 1
    @OrenIshShalom https://peps.python.org/pep-0570/ (it's what makes `test_rejects_keyword_args` pass to match the real `set` creation interface). – jonrsharpe Aug 07 '22 at 12:02