I'm working on a module that allows a user to create instances of SQLAlchemy's URL
objects specifically for connecting to MS SQL Server via pyodbc. The module needs to expose a convenient API where URL
s can be created by specifying either hostname, port and database, or a DSN, or by passing a raw ODBC connection string. The string representations of those URL
s would therefore look like the following, where the database and driver are already specified and the rest is up to the user:
"mssql+pyodbc://<username>:<password>@<host>:<port>/<database>?driver=<odbc-driver>"
"mssql+pyodbc://<username>:<password>@<dsn>"
"mssql+pyodbc://<username>:<password>@?odbc_connect=<connection-string>"
Now this seems like a good use case for the factory pattern, whereby I create a separate method/function (e.g. from_hostname
, from_dsn
, from_connection_string
) for each of the different ways to create a URL
. But I can think of four different implementations of that pattern, and I'm wondering which one to prefer.
(Side notes: You'll notice below, that I instantiate URL
s via the class factory method URL.create
. This is because the SQLAlchemy developers would like to keep users from instantiating URL
s via direct calls to the default constructor. Also, for simplicity's sake I'm ignoring all sorts of other useful parameters that the methods/functions should accept, such as for authentication.)
1 Inheritance
I subclass URL
adding the value for the drivername
argument of URL.create
as a class attribute/constant. Then I add my class methods.
from sqlalchemy.engine import URL
class MyURL(URL):
_DRIVERNAME = "mssql+pyodbc"
@classmethod
def from_hostname(cls, host, port, database):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return super().create(**parts)
@classmethod
def from_dsn(cls, dsn):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": dsn
}
return super().create(**parts)
@classmethod
def from_connection_string(cls, connection_string):
parts = {
"drivername": MyURL._DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return super().create(**parts)
Usage:
MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')
MyURL
would of course inherit all methods from its parent, including MyURL.create
which allows for the instantiation of all sorts of URL
s (including non-SQL-Server ones), or MyURL.set
which allows to modify the URL
, including the drivername
part. This goes against the intention of the MyURL
class, which exists specifically to provide a few convenient methods to create URL
s for SQL Server via pyodbc only. Also, since now all those parent methods are exposed by my module I would feel obligated to document them for the user which leads to a lot of redundant documentation (I suppose I could just refer to SQLAlchemy's documentation for all other methods and attributes, or something). But the bottom-line is, all of this is somewhat undesirable.
Could it be that a parent-child relationship between URL
and MyURL
is actually not the right choice here, i.e. since it turns out we're not even interested in inheriting from URL
in the first place, is MyURL
semantically not a child of URL
?
2 Delegation
The implementation of delegation is almost identical to inheritance, except we obviously remove the parent class from MyURL
and replace the call to super
with the class name.
from sqlalchemy.engine import URL
class MyURL:
_DRIVERNAME = "mssql+pyodbc"
@classmethod
def from_hostname(cls, host, port, database):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return URL.create(**parts)
@classmethod
def from_dsn(cls, dsn):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": dsn
}
return URL.create(**parts)
@classmethod
def from_connection_string(cls, connection_string):
parts = {
"drivername": MyURL._DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return URL.create(**parts)
Usage:
MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')
This approach leaves MyURL
without all the baggage from URL
, and it doesn't imply a parent-child relationship. But it doesn't necessarily feel right either.
Is it overkill to create a class that does absolutely nothing other than encapsulate a few factory methods? Or maybe this is an anti-pattern, because we create a class MyURL
even though there is not much use for instances of type MyURL
(after all, we're only looking to create instances of URL
)?
3 Module-level factory functions
This is a pattern along the lines of SQLAlchemy's own make_url
factory function (which is essentially a wrapper around URL.create
). I can think of two ways to implement it.
3.A Multiple factory functions
The implementation of this one is pretty straightforward. It's again almost identical to inheritance and delegation, except of course the functions and attributes aren't wrapped in a class.
from sqlalchemy import URL
_DRIVERNAME = "mssql+pyodbc"
def url_from_hostname(host, port, database):
parts = {
"drivername": _DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return URL.create(**parts)
def url_from_dsn(dsn):
parts = {
"drivername": _DRIVERNAME,
"host": dsn
}
return URL.create(**parts)
def url_from_connection_string(connection_string):
parts = {
"drivername": _DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return URL.create(**parts)
Usage:
url_from_hostname('host', 1234, 'db')
url_from_dsn('my-dsn')
url_from_connection_string('Server=MyServer;Database=MyDatabase')
Does this create a somewhat "cluttered" module API? Is it again an anti-pattern to create a module API with separate functions that all do sort of the same thing, however? Shouldn't there be something that "connects" or "encapsulates" those clearly related functions (such as a class ...)?
3.B Single factory function
Trying to encapsulate all the different ways to create URL
s by a single function means that certain parameters are mutually exclusive (host
, port
and database
vs dsn
vs connection_string
). This makes the implementation a little more involved. Users will almost certainly make mistakes despite all documentation efforts, so one would probably want to validate the supplied function arguments and raise an exception if the combination of arguments doesn't make any sense. Decorators, as suggested here and here, seem like an elegant way to do that. Of course, the if
-elif
logic in the url
function could also be extended to do all that, so this is really just one (and probably not the best) possible implementation.
from functools import wraps
from sqlalchemy import URL
_DRIVERNAME = "mssql+pyodbc"
class MutuallyExclusiveError(Exception):
pass
def mutually_exclusive(*args, **kwargs):
excl_args = args
def inner(f):
@wraps(f)
def wrapper(*args, **kwargs):
counter = 0
for ea in excl_args:
if any(key in kwargs for key in ea):
counter += 1
if counter > 1:
raise MutuallyExclusiveError
return f(*args, **kwargs)
return wrapper
return inner
@mutually_exclusive(
["host", "port", "database"],
["dsn"],
["connection_string"]
)
def url(host=None, port=None, database=None, dsn=None, connection_string=None):
parts = {
"drivername": _DRIVERNAME,
"host": host or dsn,
"port": port,
"database": database
}
if host:
parts["query"] = {"driver": "ODBC Driver 17 or SQL Server"}
elif connection_string:
parts["query"] = {"odbc_connect": connection_string}
return URL.create(**parts)
Usage:
url(host='host', port=1234, database='db')
url(dsn='my-dsn')
url(connection_string='Server=MyServer;Database=MyDatabase')
If the user passes positional rather than keyword arguments they will completely bypass our validation, though, so that's an issue. Moreover, using positional arguments in an efficient manner isn't even really possible for DSNs and connection strings, unless one does something weird like url(None, None, None, 'my-dsn')
. One solution would be to disable positional arguments altogether by changing the function definition to def url(*, host=None, ...):
, thereby essentially discarding positional arguments. All of the above also doesn't quite feel right.
Is it bad practice when a function won't accept positional arguments? Is the whole concept of validating input not somewhat "un-pythonic", or does this merely refer to things like type checking? Is this generally just trying to force too much into a single function?
Any thoughts on all or part of the above (specifically questions raised in italics) would be very much appreciated.
Thanks!