2

I have a generic smoothing function, and based on a configuration file (yaml which will be loaded as a dictionary), different implementation (boxcar or gaussian) will be called. These implementations have different number of arguments, for example boxcar requires winsize, while gaussian require winsize and variance.

Here is my current implementation:

def smoothing(dataDf, selected_columns, kwargs):

    method = kwargs['method']

    if method == 'boxcar':
        boxcar(dataDf, selected_columns, kwargs['arguments'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['arguments'])
    else:
        raise NotImplementedError

Would there be a better way to implement this?

Dzung Nguyen
  • 3,794
  • 9
  • 48
  • 86
  • 1
    why not using a dictionnary instead of a function? I'm thinking about something like the accepted answer of this thread: https://stackoverflow.com/questions/2283210/python-function-pointer. Also, stackoverflow is not the best place for coding advices. – Liora Haydont Jan 05 '18 at 21:30
  • 1
    https://github.com/AvivC/genericfuncs – Eli Korvigo Jan 05 '18 at 21:31
  • 1
    You're looking for the [Strategy Pattern](https://stackoverflow.com/questions/963965/how-to-write-strategy-pattern-in-python-differently-than-example-in-wikipedia) – Peter Wood Jan 05 '18 at 21:55
  • Possible duplicate of [How to write Strategy Pattern in Python differently than example in Wikipedia?](https://stackoverflow.com/questions/963965/how-to-write-strategy-pattern-in-python-differently-than-example-in-wikipedia) – Peter Wood Jan 05 '18 at 21:56
  • @PeterWood strategy pattern is OOP, which is not the only way to address this issue. I would argue, it's not even a particularly nice way, because sometimes OOP overcomplicates stuff and strategy pattern is a testament to that, imho. – Eli Korvigo Jan 05 '18 at 21:58
  • @EliKorvigo I'm not sure I understand the distinction. In Python, every name refers to an object. How can you program without objects in Python? – Peter Wood Jan 05 '18 at 22:10
  • 1
    @PeterWood objects do not imply OOP style. Haskell, a pure (at least as it gets) functional language, has objects, too. It doesn't make it an OOP language. That's why Python is not a 'pure' OOP language – you are not forced to use OOP style (which surely has its rightful domain, which doesn't make it the only right way to do things). – Eli Korvigo Jan 05 '18 at 22:14
  • @EliKorvigo I'm not really advocating an OOP style. Many design patterns address shortcomings of the implementation languages. In C++/Java we create `Functors`, which are function objects, which support function call syntax. We have to create them explicitly and Patterns like `Strategy`, `State`, and `Command` depend upon them for their implementation. However, as functions in Python are already objects (and instances of classes which implement `__call__`, thus enabling function call syntax) we don't have to implement it ourselves. So we just use a function. But it's still the Strategy pattern – Peter Wood Jan 05 '18 at 22:33

4 Answers4

3

I would consider two options

  1. Use a dictionary of functions:

    methods = {
        'boxcar': function1,
        'gaussian': function2
    }
    
    try:
        method = methods[kwargs['method']]
        ...
    except KeyError:
        raise NotImplementedError
    

    You can make it a bit more user-friendly

    def smoothing(dataDf, selected_columns, method, *args, **kwargs):
        try:
            return methods[kwargs['method']](
                dataDf, selected_columns, *args, **kwargs
            )
        except KeyError:
            raise NotImplementedError('{} is not a valid method'.format(method))
    
  2. Use multiple dispatch. It lets you dispatch on function signature and types

    In [1]: from multipledispatch import dispatch
    
    In [2]: @dispatch(int, int)
       ...: def f(x, y):
       ...:     return x, y
       ...: 
    
    In [3]: @dispatch(int)
       ...: def f(x):
       ...:     return x
       ...: 
    
    In [5]: f(1)
    Out[5]: 1
    
    In [6]: f(1, 2)
    Out[6]: (1, 2)
    

    In your case

    @dispatch(...list some/all argument types here...)
    def smoothing(...signature for boxcar...):
        pass
    
    @dispatch(...list some/all argument types here...)
    def smoothing(...signature for gaussian...)
        pass
    
Eli Korvigo
  • 10,265
  • 6
  • 47
  • 73
1

I'd say that this question is primary opinion based, but I'm kinda... bored so here's my take:

In these cases I always tend to give priority to readability by other users. We all know that code should be properly commented, explained, with lots of documentation around, right? But I also should go to the gym and yet, here I am, writing this from the coziness of my sofa (which I'm not planning to abandon until... mid-April, more or less, when the weather gets better).

To me, if other people are going to read your code, I think it's very important to take advantage of the fact that Python, if properly written can be very, very clear (it's almost like pseudo-code that runs, right?)

So in your case, I wouldn't even create this kind-of-wrapper function. I would have a module smoothing.py containing all your smoothing functions. Not only that, I'd import the module (import smoothing) as opposed to from smoothing import boxcar, gaussian) so I can be very explicit on my calls:

if method == 'boxcar':
   smoothing.boxcar(whatever whatever...)  
   # Someone reading this will be able to figure out that is an smoothing
   # function from a module called `"smoothing"`. Also, no magic done with
   # things like my_func = getattr(smoothing, method)... none of that: be
   # clear and explicit. 
   # For instance, many IDEs allow you to navigate to the function's
   # definition, but for that to work properly, the code needs to be explicit 
elif method == 'gaussian':
   smoothing.gaussian(whatever whatever...)
else:
   raise ValueError(
          'Unknown smoothing method "%s".'
          ' Please see available method in "%s" or add'
          ' a new smoothing entry into %s' % (
                 method, 
                 os.path.abspath(smoothing.__file__), 
                 os.path.abspath(__file__)
          )
   )

Something like that. Something that if someone receives an error, can quickly understand where and why is it happening.

Otherwise, if you still want to keep your structure, I'd say that, since you're always going to need your 'method', don't put that into your kwargs. Make it positional:

def smoothing(method, dataDf, selected_columns, kwargs):
    if method == 'boxcar':
        boxcar(dataDf, selected_columns, kwargs['arguments'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['arguments'])
    else:
        raise NotImplementedError

Another thing you could do, is instead of having the possibility of having bad arguments in the kwargs dict, force it to include the proper arguments (what if someone passes in kwarg['arguments'] the argument method=boxcar but gives you kwarg['arguments'] for gaussian? Don't let that even be possible (make it crash as soon as possible):

def smoothing(method, dataDf, selected_columns, **kwargs):
    if method == 'boxcar':
        assert 'variance' not in kwargs  # If `boxcar` shouldn't have a "variance"
        boxcar(dataDf, selected_columns, kwargs['windsize'])
    elif method == 'gaussian':
        gaussian(dataDf, selected_columns, kwargs['windsize'], kwargs['variance'])
    else:
        raise NotImplementedError

And always provide a proper message in your exceptions (give a proper explanation on your NotImplementedError)

There's a lot of "magic" you can do with Python. That doesn't mean you have to do it. For instance, you could get a fairly similar behavior to what your implementation of the smoothing function is doing by writing something like this:

def smoothing(dataDf, selected_columns, kwargs):
    return globals().get(
        kwargs.pop('method') or 'NOT_IMPLEMENTED_FOR_SURE!!', 
        lambda *_: (_ for _ in ()).throw(NotImplementedError())
    )(dataDf, selected_columns, kwargs['arguments'])

But if someone else reads that... well... good luck to that person :-P

Savir
  • 17,568
  • 15
  • 82
  • 136
1
methods = {
    'boxcar': boxcar,
    'gaussian': gaussian,
}

MESSAGES = {
    'MISSING_METHOD': 'No method named {} was found.'
}

def smooth(dataDf, selected_columns, **kwargs):
    """Smooth dataframe columns."""
    # Here, we are providing a default if the
    # user doesn't provide a keyword argument
    # for `method`. You're accessing it with
    # brackets and if it's not provided it will
    # raise a KeyError. If it's mandatory, put
    # it as such. But you can do better by
    # providing a default. Like below

    method_name = kwargs.get('method', 'gaussian')
    method = methods.get(method_name)

    if method is None:
        msg = MESSAGES['MISSING_METHOD'].format(method_name)
        raise NotImplementedError(msg)
    return method(dataDf, selected_columns, **kwargs)

If method is a fairly important part of the call and the user is aware of the argument, you can write it as follows:

def smooth(dataDf, selected_columns, method='gaussian', **kwargs):
    """Smooth dataframe columns."""
    # Note that kwargs no longer contains `method`;
    # our call to the 'real' method is not 'polluted'.

    _method = methods.get(method)

    if _method is None:
        msg = MESSAGES['MISSING_METHOD'].format(method)
        raise NotImplementedError(msg)
    return _method(dataDf, selected_columns, **kwargs)

However, there's a problem in the methods dictionary:

  • Its keys are function names and values are function objects, which requires us to have access to the functions.

You want to get a dictionary where keys and values are strings (such as one you would get from a YAML file).

The assumption I will make is that the functions exist in the current context, whether you have defined them or imported them.

from third.party.library import really_ugly_name_you_didnt_choose_but_imported

def gaussian(dataDf, selected_columns, **kwargs):
    pass

_methods = globals()

# This is a dictionary where keys and values
# only contain strings, like from a YAML file.

methods = {
    'boxcar': 'boxcar',
    'gaussian': 'gaussian',
    'roundcar': 'really_ugly_name_you_didnt_choose_but_imported',
}

MESSAGES = {
    'MISSING_METHOD': 'No method named {} was found.'
}

def smooth(dataDf, selected_columns, method='gaussian', **kwargs):
    """Smooth dataframe columns."""

    # Lets check if it is in authorized methods
    # We're using globals() and we don't want
    # the user to accidentally use a function
    # that has no relation to smoothing.

    # So we're looking at the dictionary from
    # the YAML file.

    # Let's get the "real name" of the function
    # so if `method` were (str) 'roundcar', `method_name`
    # would be (str) 'really_ugly_name_you_didnt_choose_but_imported'

    method_name = methods.get(method)

    # Now that we have the real name, let's look for
    # the function object, in _methods.

    _method = _methods.get(method_name)

    if None in (_method, method_name):
        msg = MESSAGES['MISSING_METHOD'].format(method)
        # Note that we raise the exception for the function
        # name the user required, i.e: roundcar, not
        # the real function name the user might be unaware
        # of, 'really_ugly_name_you_didnt_choose_but_imported'.
        raise NotImplementedError(msg)
    return _method(dataDf, selected_columns, **kwargs)
Jugurtha Hadjar
  • 441
  • 3
  • 7
0

Your algorithm functions are Strategies. You can store them in a dictionary for easy lookup.

Use a defaultdict with a "not implemented" Strategy for missing algorithms:

def algorithm_not_implemented(*args, **kwargs):
    raise NotImplementedError

algorithms = defaultdict(algorithm_not_implemented)

This means if you try and access a non-existent algorithm, it will return algorithm_not_implemented and when you call it, it will raise NotImplementedError:

>>> algorithms['pete'](1, 2, 3)
Traceback (most recent call last):
NotImplementedError

You can add your algorithms:

algorithms['boxcar'] = boxcar
algorithms['gaussian'] = gaussian

And you can call them:

def smoothing(dataDf, selected_columns, kwargs):
    method = kwargs['method']
    arguments = kwargs['arguments']

    algorithms[method](dataDf, selected_columns, arguments)
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
  • You were preaching about strategy pattern, yet your answer has little to do with it, though it solves the same problem. And how is it different from several dict-based answers, that have already been posted? – Eli Korvigo Jan 05 '18 at 22:23
  • @EliKorvigo not sure I was preaching, sorry if it came across that way. I think my solution is unique in that it uses a `defaultdict` – Peter Wood Jan 05 '18 at 22:39