3

In the problem I'm working, there are data identifiers that have the form scope:name, being both scope and name strings. name has different parts separated by dots, like part1.part2.part3.part4.part5. On many occasions, but not always, scope is just equal to part1 of name. The code I'm writing has to work with different systems that provide or require the identifiers in different patterns. Sometimes they just require the full string representation like scope:name, on some other occasions calls have two different parameters scope and name. When receiving information from other systems, sometimes the full string scope:nameis returned, sometimes scope is omitted and should be inferred from name and sometimes a dict that contains scope and name is returned.

To ease the use of these identifiers, I have created a class to internally manage them, so that I don't have to write the same conversions, splits and formats over and over again. The class is quite simple. It only has two attributes (scope and name, a method to parse strings into objects of the class, and some magic methods to represent the objects Particularly, __str__(self) returns the object in the form scope:name, which is the fully qualified name (fqn) of the identifier:

class DID(object):
    """Represent a data identifier."""

    def __init__(self, scope, name):
        self.scope = scope
        self.name = name

    @classmethod
    def parse(cls, s, auto_scope=False):
        """Create a DID object given its string representation.

        Parameters
        ----------
        s : str
            The string, i.e. 'scope:name', or 'name' if auto_scope is True.

        auto_scope : bool, optional
            If True, and when no scope is provided, the scope will be set to
            the projectname. Default False.

        Returns
        -------
        DID
            The DID object that represents the given fully qualified name.

        """
        if isinstance(s, basestring):
            arr = s.split(':', 2)
        else:
            raise TypeError('string expected.')

        if len(arr) == 1:
            if auto_scope:
                return cls(s.split('.', 1)[0], s)
            else:
                raise ValueError(
                    "Expecting 'scope:name' when auto_scope is False"
                )
        elif len(arr) == 2:
            return cls(*arr)
        else:
            raise ValueError("Too many ':'")

    def __repr__(self):
        return "DID(scope='{0.scope}', name='{0.name}')".format(self)

    def __str__(self):
        return u'{0.scope}:{0.name}'.format(self)

As I said, the code has to perform comparisons with strings and use the string representation of some methods. I am tempted to write the __eq__ magic method and its counterpart __ne__. The following is an implementation of just __eq__:

    # APPROACH 1:
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.scope == other.scope and self.name == other.name
        elif isinstance(other, basestring):
            return str(self) == other
        else:
            return False

As you see, it defines the equality comparison between both DIDs and strings in a way that is possible to compare one with the other. My issue with this is whether it is a good practice:

On the one hand, when other is a string, the method casts self to be a string and I keep thinking on explicit better than implicit. You could end up thinking that you are working with two strings, which is not the case of self.

On the other hand, from the point of view of meaning, a DID represents the fqn scope:name and it makes sense to compare for equality with strings as it does when an int and a float are compared, or any two objects derived from basetring are compared.

I also have thought on not including the basestring case in the implementation, but to me this is even worse and prone to mistakes:

    # APPROACH 2:
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.scope == other.scope and self.name == other.name
        else:
            return False

In approach 2, a comparison for equality between a DID object and a string, both representing the same identifier, returns False. To me, this is even more prone to mistakes.

Which are the best practices in this situation? Should the comparison between a DID and a string be implemented as it is in approach 1, even though objects from different types might be considered equal? Should I use approach 2 even though s != DID.parse(s)? Should I not implement the __eq__ and __ne__ so that there are never misunderstoods?

TrilceAC
  • 301
  • 3
  • 10
  • 1
    Have you ever tried `1 == 1.0`? Note however that `__eq__` does not exist in isolation... that's why `hash(1) == hash(1.0)`, you don't want to break Liskov principle with a custom implementation of `__eq__` that braks hashing of subclasses or stuff like that... – Giacomo Alzetta Jul 22 '19 at 13:04
  • 1
    For the record, if you're on Python 2, and you implement `__eq__`, make sure to [implement `__ne__` in terms of `__eq__`](https://stackoverflow.com/a/35781654/364696); Python 3 will do this for you, Python 2 will not. – ShadowRanger Jul 22 '19 at 13:07
  • 1
    Duck typing would suggest you simply try the attribute comparison, and if an `AttributeError` gets raised, then fall back to string comparison. Don't worry about the exact type of `other`. – chepner Jul 22 '19 at 13:08
  • @GiacomoAlzetta, you formally pointed out my concerns with the approach I'm proposing. `__eq__` does not exist in isolation. Thanks. – TrilceAC Jul 22 '19 at 13:27
  • @ShadowRanger. I took that into account. It's something important to take into account when stuck in Python 2. – TrilceAC Jul 22 '19 at 13:29
  • @chepner, I also implemented a version with the duck typing behaviour that you mention, but somehow it was even more rare to allow comparison of two objects bases on the fact that they have an scope attribute and a name attribute, which is the idea of duck typing. [This post](https://devinpractice.com/2016/11/29/python-objects-comparison/) explains why I discarded your approach. – TrilceAC Jul 22 '19 at 13:36

1 Answers1

3

A few classes in Python (but I can't think of anything in the standard library off the top of my head) define an equality operator that handles multiple types on the RHS. One common library that does support this is NumPy, with:

import numpy as np

np.array(1) == 1

evaluating to True. In general I think I'd discourage this sort of thing, as there are lots of corner cases where this behaviour can get tricky. E.g. see the write up in Python 3 __hash__ method (similar things exist in Python 2, but it's end-of-life). In cases where I have written similar code, I've tended to end up with something closer to:

def __eq__(self, other):
    if isinstance(other, str):
        try:
            other = self.parse(str)
        except ValueError:
            return NotImplemented

    if isinstance(other, DID):
        return self.scope == other.scope and self.name == other.name

    return NotImplemented

Further to this, I'd suggest making objects like this immutable and you have a few ways of doing this. Python 3 has nice dataclasses, but given that you seem to be stuck under Python 2, you might use namedtuples, something like:

from collections import namedtuple

class DID(namedtuple('DID', ('scope', 'name'))):
    __slots__ = ()

    @classmethod
    def parse(cls, s, auto_scope=False):
       return cls('foo', 'bar')

    def __eq__(self, other):
        if isinstance(other, str):
            try:
                other = self.parse(str)
            except ValueError:
                return NotImplemented

        return super(DID, self).__eq__(other)

which gives you immutability and a repr method for free, but you might want to keep your own str method. The __slots__ attribute means that accidentally assigning to obj.scopes will fail, but you might want to allow this behaviour.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • 1
    Note: I'd suggest returning `NotImplemented` rather than `False` when you don't know how to compare the types. That allows the right hand side's `__eq__` to be checked (in case it knows how to perform the comparison). – ShadowRanger Jul 22 '19 at 13:08
  • @ShadowRanger yup, that's probably better! not sure if it'll matter here though, we know the RHS is a `str` and hence it'll only ever return `False` won't it? maybe if somebody else subclasses it? – Sam Mason Jul 22 '19 at 13:16
  • 2
    We know *nothing* about the RHS. It's the case where it's *not* a `str` or a `DID` that returning `NotImplemented` handles (it's needed more in your first `__eq__` example than your second). You're planning for code reuse; at some point in the future, someone else may write some other class that should be comparable to `DID`. If you don't write a proper `__eq__`, then whether the equality comparison is performed correctly will depend on order; `did == didcomparable` always return `False`, while `didcomparable == did` would return `True` when appropriate. `NotImplemented` fixes the former. – ShadowRanger Jul 22 '19 at 13:25
  • oops, sorry I forgot about that line! had moved on to other things, and was only thinking about the class I wrote at the bottom. what you said is of course valid – Sam Mason Jul 22 '19 at 13:28
  • @SamMason as you guess, I'm stuck in Python2. I wasn't aware of namedtuples. In the solution you propose, `__eq__` has also been extended to compare `DID`s and tuples, named or not, i.e. `DID('foo', 'bar') == ('foo', 'bar')`. At this point, I have no strong opinion on this. As [doc](https://docs.python.org/2/library/collections.html#collections.namedtuple) states, subclassing when adding new fields deserves some attention: _Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the `_fields` attr_ i.e., there is no inheritance of methods. – TrilceAC Jul 22 '19 at 15:11
  • 1
    @TrilceAC it also allows tuples because it just passes everything off to `super.__eq__`, you could explicitly check that `other` is an instance of `DID` first, otherwise return `NotImplemented`. AFAIK, the "subclassing" comment in the docs is mostly because methods like the initialiser, str and eq won't see these added fields unless you do things officially. it inherits methods just fine, you end up with a standard class, or I'm misunderstanding something again :) – Sam Mason Jul 22 '19 at 15:22
  • @SamMason, look at how `Point3D` _inherits_ from `Point` at the [doc](https://docs.python.org/2/library/collections.html#collections.namedtuple): `Point3D = namedtuple('Point3D', Point._fields + ('z',))`. Inheritance would mean that there is an `hypot`method for `Point3D`, but trying to call hypot() on a Point3D raises an `AttributeError` exception. – TrilceAC Jul 22 '19 at 16:20
  • 1
    `Point3D` doesn't "inherit" from `Point` it creates a new, independant, class that happens to use the same fields as `Point` with an additional field `z`! this isn't what most people would mean when they talk about inheritance anyway. – Sam Mason Jul 22 '19 at 16:25
  • @SamMason Right, but it is what the doc recommends when one wants to add a new stored field. This is why I pasted this sentence that should be taken into consideration: _Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the `_fields` attr_. You cannot inherit from `Point` because _is not useful for adding new, stored fields_. You should _create a new named tuple type from the `_fields`attr_ but it doesn't inherit. It seems that if you want to add a new field to the tuple, your only option is to reimplement almost all the class. – TrilceAC Jul 22 '19 at 19:07