Factory design pattern: inheritance vs delegation vs module-level functions

Question

I'm working on a module that allows a user to create instances of SQLAlchemy's URL objects specifically for connecting to MS SQL Server via pyodbc. The module needs to expose a convenient API where URLs can be created by specifying either hostname, port and database, or a DSN, or by passing a raw ODBC connection string. The string representations of those URLs would therefore look like the following, where the database and driver are already specified and the rest is up to the user:

"mssql+pyodbc://<username>:<password>@<host>:<port>/<database>?driver=<odbc-driver>"
"mssql+pyodbc://<username>:<password>@<dsn>"
"mssql+pyodbc://<username>:<password>@?odbc_connect=<connection-string>"

Now this seems like a good use case for the factory pattern, whereby I create a separate method/function (e.g. from_hostname, from_dsn, from_connection_string) for each of the different ways to create a URL. But I can think of four different implementations of that pattern, and I'm wondering which one to prefer.

(Side notes: You'll notice below, that I instantiate URLs via the class factory method URL.create. This is because the SQLAlchemy developers would like to keep users from instantiating URLs via direct calls to the default constructor. Also, for simplicity's sake I'm ignoring all sorts of other useful parameters that the methods/functions should accept, such as for authentication.)

1 Inheritance

I subclass URL adding the value for the drivername argument of URL.create as a class attribute/constant. Then I add my class methods.

from sqlalchemy.engine import URL

class MyURL(URL):
    _DRIVERNAME = "mssql+pyodbc"

    @classmethod
    def from_hostname(cls, host, port, database):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "host": host,
            "port": port,
            "database": database,
            "query": {"driver": "ODBC Driver 17 or SQL Server"}
        }
        return super().create(**parts)

    @classmethod
    def from_dsn(cls, dsn):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "host": dsn
        }
        return super().create(**parts)

    @classmethod
    def from_connection_string(cls, connection_string):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "query": {"odbc_connect": connection_string}
        }
        return super().create(**parts)

Usage:

MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')

MyURL would of course inherit all methods from its parent, including MyURL.create which allows for the instantiation of all sorts of URLs (including non-SQL-Server ones), or MyURL.set which allows to modify the URL, including the drivername part. This goes against the intention of the MyURL class, which exists specifically to provide a few convenient methods to create URLs for SQL Server via pyodbc only. Also, since now all those parent methods are exposed by my module I would feel obligated to document them for the user which leads to a lot of redundant documentation (I suppose I could just refer to SQLAlchemy's documentation for all other methods and attributes, or something). But the bottom-line is, all of this is somewhat undesirable.

Could it be that a parent-child relationship between URL and MyURL is actually not the right choice here, i.e. since it turns out we're not even interested in inheriting from URL in the first place, is MyURL semantically not a child of URL?

2 Delegation

The implementation of delegation is almost identical to inheritance, except we obviously remove the parent class from MyURL and replace the call to super with the class name.

from sqlalchemy.engine import URL

class MyURL:
    _DRIVERNAME = "mssql+pyodbc"

    @classmethod
    def from_hostname(cls, host, port, database):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "host": host,
            "port": port,
            "database": database,
            "query": {"driver": "ODBC Driver 17 or SQL Server"}
        }
        return URL.create(**parts)

    @classmethod
    def from_dsn(cls, dsn):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "host": dsn
        }
        return URL.create(**parts)

    @classmethod
    def from_connection_string(cls, connection_string):
        parts = {
            "drivername": MyURL._DRIVERNAME,
            "query": {"odbc_connect": connection_string}
        }
        return URL.create(**parts)

Usage:

MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')

This approach leaves MyURL without all the baggage from URL, and it doesn't imply a parent-child relationship. But it doesn't necessarily feel right either.

Is it overkill to create a class that does absolutely nothing other than encapsulate a few factory methods? Or maybe this is an anti-pattern, because we create a class MyURL even though there is not much use for instances of type MyURL (after all, we're only looking to create instances of URL)?

3 Module-level factory functions

This is a pattern along the lines of SQLAlchemy's own make_url factory function (which is essentially a wrapper around URL.create). I can think of two ways to implement it.

3.A Multiple factory functions

The implementation of this one is pretty straightforward. It's again almost identical to inheritance and delegation, except of course the functions and attributes aren't wrapped in a class.

from sqlalchemy import URL

_DRIVERNAME = "mssql+pyodbc"

def url_from_hostname(host, port, database):
    parts = {
        "drivername": _DRIVERNAME,
        "host": host,
        "port": port,
        "database": database,
        "query": {"driver": "ODBC Driver 17 or SQL Server"}
    }
    return URL.create(**parts)

def url_from_dsn(dsn):
    parts = {
        "drivername": _DRIVERNAME,
        "host": dsn
    }
    return URL.create(**parts)

def url_from_connection_string(connection_string):
    parts = {
        "drivername": _DRIVERNAME,
        "query": {"odbc_connect": connection_string}
    }
    return URL.create(**parts)

Usage:

url_from_hostname('host', 1234, 'db')
url_from_dsn('my-dsn')
url_from_connection_string('Server=MyServer;Database=MyDatabase')

Does this create a somewhat "cluttered" module API? Is it again an anti-pattern to create a module API with separate functions that all do sort of the same thing, however? Shouldn't there be something that "connects" or "encapsulates" those clearly related functions (such as a class ...)?

3.B Single factory function

Trying to encapsulate all the different ways to create URLs by a single function means that certain parameters are mutually exclusive (host, port and database vs dsn vs connection_string). This makes the implementation a little more involved. Users will almost certainly make mistakes despite all documentation efforts, so one would probably want to validate the supplied function arguments and raise an exception if the combination of arguments doesn't make any sense. Decorators, as suggested here and here, seem like an elegant way to do that. Of course, the if-elif logic in the url function could also be extended to do all that, so this is really just one (and probably not the best) possible implementation.

from functools import wraps
from sqlalchemy import URL

_DRIVERNAME = "mssql+pyodbc"

class MutuallyExclusiveError(Exception):
    pass

def mutually_exclusive(*args, **kwargs):
    excl_args = args
    def inner(f):
        @wraps(f)
        def wrapper(*args, **kwargs):
            counter = 0
            for ea in excl_args:
                if any(key in kwargs for key in ea):
                    counter += 1
            if counter > 1:
                raise MutuallyExclusiveError
            return f(*args, **kwargs)
        return wrapper
    return inner

@mutually_exclusive(
    ["host", "port", "database"],
    ["dsn"],
    ["connection_string"]
)
def url(host=None, port=None, database=None, dsn=None, connection_string=None):
    parts = {
        "drivername": _DRIVERNAME,
        "host": host or dsn,
        "port": port,
        "database": database
    }
    if host:
        parts["query"] = {"driver": "ODBC Driver 17 or SQL Server"}
    elif connection_string:
        parts["query"] = {"odbc_connect": connection_string}
    return URL.create(**parts)

Usage:

url(host='host', port=1234, database='db')
url(dsn='my-dsn')
url(connection_string='Server=MyServer;Database=MyDatabase')

If the user passes positional rather than keyword arguments they will completely bypass our validation, though, so that's an issue. Moreover, using positional arguments in an efficient manner isn't even really possible for DSNs and connection strings, unless one does something weird like url(None, None, None, 'my-dsn'). One solution would be to disable positional arguments altogether by changing the function definition to def url(*, host=None, ...):, thereby essentially discarding positional arguments. All of the above also doesn't quite feel right.

Is it bad practice when a function won't accept positional arguments? Is the whole concept of validating input not somewhat "un-pythonic", or does this merely refer to things like type checking? Is this generally just trying to force too much into a single function?

Any thoughts on all or part of the above (specifically questions raised in italics) would be very much appreciated.

Thanks!

score 0 · Accepted Answer · answered Oct 08 '21 at 22:59

I'll attempt to answer my own question as best I can. Let me first look at implementing the factory pattern via classes.

Comment on: 1 Inheritance

The two terms subclassing and subtyping are often mentioned within the context of inheritance. While the former implies a syntactic relationship through implementation reuse (implementation inheritance), the latter implies a semantic "is-a" relationship (interface inheritance). The two concepts are often conflated in Python, but when I ask whether MyURL objects are or aren't URL objects, I'm referring to the semantic relationship between the two.

Of course, when I subclass URL in my code example above, I am indeed creating a subtype of URL that satisfies the Liskov Substitution Principle (LSP): I added a few methods (i.e. I specialized URL), but I can still pass MyURL instances to SQLAlchemy's create_engine function and nothing breaks. That's because MyURL implements the complete interface of its (generalized) superclass.

What I am really trying to achieve, though, is for MyURL to not only add those few methods and attributes but to also only possess (or expose) a subset of the methods of its superclass, in an attempt to disable ways to create URL strings that would be incompatible with SQL Server. Others have asked about removing superclass methods in subclasses (see here and here for example), but doing so will violate the LSP as well as the "is-a" relationship between the two classes.

So I suppose that inheritance through subclassing is in fact not what I should be doing here.

Comment on: 2 Delegation

Delegation is another example of implementation reuse whereby not a class "blueprint" is shared but an instance of a class. It's therefore more of a "has-a" relationship. Specifically, in my code example I'm doing an implicit delegation, since I'm not passing URL or an instance thereof as a parameter to MyURL's methods. URL.create is a class method, so I can access it directly. In fact, since SQLALchemy's URLs are themselves a subclass of (immutable) tuples I wouldn't even be able to create my specialized version after instantiating them.

Some of my confusion about what point there is to having instances of MyURL stemmed from the fact, that I was still to much focused on a "is-a" relationship. Realizing that this is not the case makes it clearer that what MyURL actually is is a factory class to create URLs. I could rename it to MyURLFactory to make that distinction clearer.

I could even remove the @classmethod decorators. To use MyURL I would then have to instantiate it before use (although I'm not sure what the benefit of that would be):

my_url_factory = MyURLFactory()
my_url_factory.from_hostname('host', 1234, 'db')

Looking at it from this angle makes me think that this might be a good way to go about my problem. But let's also revisit the module-level factory functions.

Comment on: 3.A Multiple factory functions

One problem I have with this solution is that the factory functions are all very closely related and do pretty much the same thing. There's a potential for code repetition. Of course, I could avoid that by moving shared code into a private function:

_DRIVERNAME = "mssql+pyodbc"

def _make_parts_dict(*args, **kwargs):
    return dict(kwargs, drivername=_DRIVERNAME)

def url_from_hostname(host, port, database):
    parts = _make_parts_dict(
        host=host,
        port=port,
        database=database,
        query={"driver": "ODBC Driver 17 or SQL Server"}
    )
    return URL.create(**parts)

def url_from_dsn(dsn):
    parts = _make_parts_dict(host=dsn)
    return URL.create(**parts)

def url_from_connection_string(connection_string):
    parts = _make_parts_dict(query={"odbc_connect": connection_string})
    return URL.create(**parts)

The resulting URLs would be the same. Nevertheless, this still leaves me with the "cluttered" module API - but then again, implementation 2 would leave me with the same "cluttered" class API ...

I could also combine 2 and 3.A by adding a _make_parts_dict class method to the MyURLFactory class.

Comment on: 3.B Single factory function

I don't have too many comments on this implementation. It's probably doable, but I think 2 or (less preferred) 3.A would be much simpler to implement and maintain. The complexity of taking care of the different mutually exclusive keyword arguments doesn't seem justified considering I'm just trying to create a few URLs. Also the lack of decent support for positional arguments bothers me.

¹ A potential hack would be to keep all the references to the drivername parameter in all the inherited methods, but change all the implementations to simply ignore it or pull the value from a class attribute.

Factory design pattern: inheritance vs delegation vs module-level functions

1 Inheritance

2 Delegation

3 Module-level factory functions

3.A Multiple factory functions

3.B Single factory function

1 Answers1

Comment on: 1 Inheritance

Comment on: 2 Delegation

Comment on: 3.A Multiple factory functions

Comment on: 3.B Single factory function