1

TL;DR

This question examines an over-engineered example of python metaclasses and dataclasses to create a LiteralEnum (for validating a stringly-typed keyword argument) like LinkageMethod and a KeywordsArgumentBaseClass for making wrappers around SciPy methods like SciPyLinkage. The author would like to know how to best distinguish when something should be a property, staticmethod, classmethod, or instance method.

As to why someone would do this?

  • to override default keyword arguments of scipy methods
  • to expose keyword arguments that might be hidden under **kwargs and which get passed to another method for better developer experience.
  • to modify default behavior of scipy methods e.g. add some optional preprocessing / post processing and be able to distinguish which parameters belong to the method and which to the custom handling.

Disclaimer

Given the above explanation there is a lot of code and the M*.W.E. is not so minimal (as complexity is one of the key reasons to avoid metaclass usage especially in python which favors simplicity and readability)

Question(s)

Newbie question

I am new to using metaclasses. Are the LiteralEnum classes at least "pythonic"?

staticmethod vs classmethod vs property at the metaclass / class level?

The KeywordArgumentsMeta and KeywordArgumentsMixin classes setup some useful attributes for retrieving a dictionary of keyword arguments. With KeywordArgumentsBaseClass combining the KeywordArgumentsMixin and ClassMethodSignatureMixin.

This is where I am conflicted:

@dataclass
class BaseExample(KeywordArgumentsBaseClass):
    _: KW_ONLY
    strvar: str = 'default'
    intvar: int = 2


@dataclass
class ChildExample(BaseExample):
    _: KW_ONLY
    thirdvar: str = 'three'
    fourth: int = 4


ChildExample.keywords
> ['thirdvar', 'fourth']

ChildExample.ikeywords
> ['strvar', 'intvar']

ChildExample.akeywords
> ['strvar', 'intvar', 'thirdvar', 'fourth']


ChildExample.defaults
> {'thirdvar': 'three', 'fourth': 4}

...


ChildExample().kwargs
> {'thirdvar': 'three', 'fourth': 4}
...


ChildExample().params(**{'thirdvar': 'new', 'banana': 3})
> {'thirdvar': 'new', 'fourth': 4}

I am conflicted because I want to make a wrapper for SciPy Methods


@dataclass
class SciPyMethod(KeywordArgumentsBaseClass):    
    _: KW_ONLY

    @classmethod
    def get_method(cls):
        raise NotImplementedError
    
    @classmethod
    def call_scipy(cls, **kws):
        inst = cls()
        
        method = cls.get_method()
        params = inst.prepare_params(func = method, scope = locals(), **kws)    
        result = method(**cls.kwargs)
        raise NotImplementedError

    def call_scipy(self, **kwargs):
        cls = type(self)
        method = type(self).get_method()
        params = self.prepare_params(func = method, scope = locals(), **kwargs)
        print(params)
        raise NotImplementedError
        result = method(**cls.kwargs)
        return result

    def __call__(self, x: NPArray, **kwargs) -> NPArray:
        method = self.get_method()
        

but I need both classmethods and instance methods for this to work.

Since there are classmethods for getting default params, instance methods for getting current params, and the prepare_params methods for getting params for a function signature how can I make call_scipy work with both as classmethod and instance method?

How could this be simplified / make more pythonic?

Usefulness of ClassMethodSignatureMixin

While ClassMethodSignaturePriority seems useful at first glace, I am not actually sure if it is useful at all consider:

class Example(ClassMethodSignatureMixin):
    _: KW_ONLY
    test_var: str = 'default'


    def foo(self, test_var: Optional[str] = None, **kwargs):
        params = self.prepare_params(func=self.foo, scope=locals(), **kwargs)
        print(params)
        return params

The prepare_params method, without knowing the function signature will can handle explictly named keywords in the func which might be defined or passed in via **kwargs.

However, test_var must either be defined in the class, passed in as a (positional) keyword argument or passed in via **kwargs. Python will naturally prevent Example().foo(test_var='fine', **{'test_var': 'causes error'}).

The prepare_params method on the other hand is useful as it filters keyword arguments for the function signature only, using the local scope which helps make sure that in the case of foo method, the value of test_var gets put into params.

Or to restate more cleanly. Given a function with an unknown number of keyword arguments (like test_var in foo), prepare_params uses locals() and **kwargs to make sure there is a single dictionary to check for the values of the keyword arguments.

Code

Imports

import os, inspect
import numpy as np, pandas as pd, scipy as sp
from dataclasses import dataclass, KW_ONLY
from enum import Enum, StrEnum, EnumMeta, auto
from typing import Optional, Callable, List, Tuple, Any, Dict, Union, Literal

LiteralEnum

MetaClass

class LiteralEnumMeta(EnumMeta):
    '''LiteralEnumMeta

    See Also:
    --------
    - https://stackoverflow.com/questions/43730305/when-should-i-subclass-enummeta-instead-of-enum
    - https://peps.python.org/pep-3115/
    - https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/#class-attribute-lookup
    '''

    @classmethod
    def __prepare__(metacls, name, bases, **kwargs):
        enum_dict = super().__prepare__(name, bases, **kwargs)
        #print('PREPARE: <enum_dict> = \t', enum_dict)

        # NOTE: this will through an error since we are using StrEnum
        # enum_dict['_default'] = None
        return enum_dict

    def __init__(cls, clsname, bases, clsdict, **kwargs):
        super().__init__(clsname, bases, clsdict, **kwargs) 
        # print('INIT: <clsdict> = \t', clsname, clsdict) 

    def __new__(
        metacls, cls, bases, clsdict, *, 
        default: Optional[str] = None, elements: Optional[List[str]] = None
    ):
        # print('NEW: <clsdict> = \t', cls, clsdict)
        if elements is not None:
            for element in elements:
                clsdict[element.upper()] = auto()

        new_cls = super().__new__(metacls, cls, bases, clsdict)

        # NOTE: this will result in TypeError: cannot extend 
        if default:            
            setattr(new_cls, '_default', default)
        
        return new_cls  

    @property
    def members(cls):
        # NOTE: could also use cls._member_names_
        return [member.name for member in cls]

    @property
    def values(cls):
        return [member.value for member in cls]
    
    @property
    def items(cls):
        return list(zip(cls.members, cls.values))

LiteralEnum

class LiteralEnum(StrEnum, metaclass=LiteralEnumMeta):
    @classmethod
    def _missing_(cls, value):
        for member in cls:
            if member.value.lower() == value.lower():
                return member

        default = getattr(cls, cls._default, None)
        return default

Decorators

def enum_default(default: str = ''):
    def wrapper(cls):
        cls._default = default
        return cls
    return wrapper

def enum_set_attr(name: str = 'attr', attr: str = 'data'):
    def wrapper(cls):
        setattr(cls, f'_{name}', attr)
        return cls
    return wrapper

def set_method(method):
    def decorator(cls):
        cls.method = method
        return cls
    return decorator

SciPy LiteralEnum Examples

Linkage

@enum_default('SINGLE')
class LinkageMethod(LiteralEnum):
    '''
    See Also
    --------    
    scipy.cluster.hierarchy.linkage : Performs hierarchical/agglomerative clustering on the condensed distance matrix y.
        https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
    '''
    SINGLE = auto()
    COMPLETE = auto()
    AVERAGE = auto()
    WEIGHTED = auto()
    CENTROID = auto()
    MEDIAN = auto()
    WARD = auto()

PDistMetric

@enum_default('EUCLIDEAN')
class PDistMetric(LiteralEnum):
    '''
    See Also
    -------- 
    scipy.spatial.distance.pdist : Compute the pairwise distances between observations in n-dimensional space.
        https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist
    '''
    BRAYCURTIS = auto()
    CANBERRA = auto()
    CHEBYSHEV = auto()
    CITYBLOCK = auto()
    CORRELATION = auto()
    COSINE = auto()
    DICE = auto()
    EUCLIDEAN = auto()
    HAMMING = auto()
    JACCARD = auto()
    JENSENSHANNON = auto()
    KULCZYNSKI1 = auto()
    MAHALANOBIS = auto()
    MATCHING = auto()
    MINKOWSKI = auto()
    ROGERSTANIMOTO = auto()
    RUSSELLRAO = auto()
    SEUCLIDEAN = auto()
    SOKALMICHENER = auto()
    SOKALSNEATH = auto()
    SQEUCLIDEAN = auto()
    YULE = auto()

ScoreMethod

@enum_default('ZSCORE')
class ScoreMethod(LiteralEnum):
    '''
    See Also
    --------
    scipy.stats.zscore : Compute the z-score.    
    scipy.stats.gzscore : Compute the geometric standard score.
    '''
    ZSCORE = auto()
    GZSCORE = auto()

ClassMethodSignaturePriority

@enum_default('OBJ')
@enum_set_attr('attr', 'data')
class ClassMethodSignaturePriority(LiteralEnum):    
    OBJ = auto()
    ARG = auto()
    KWS = auto()

    def get(self, obj: object, attr: Optional[str] = None, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
        match self:
            # try and get `attr` from `obj` defaulting back to `arg`
            case ClassMethodSignaturePriority.OBJ:
                val = getattr(obj, attr, arg)
                if val is None:
                    return ClassMethodSignaturePriority('ARG').get(obj, attr, arg, **kws)
            
            # use `arg` as is unless it is None, then try and get `attr` from `obj`
            case ClassMethodSignaturePriority.ARG:
                val = arg
                if val is None:
                    return ClassMethodSignaturePriority('KWS').get(obj, attr, arg, **kws)                    
                
            # use `kws` assuming `attr` is in `kwargs` falling back to arg then try and get `attr` from `obj`
            case ClassMethodSignaturePriority.KWS:
                val = kws.get(attr, arg)   
            
            case _:
                pass

        if val is None:
            val = getattr(obj, attr, arg)

        if isinstance(val, (list, np.ndarray, )):
            val = np.asanyarray(val)

        return val

    @classmethod
    def prioritize(cls, obj: object, attr: str, arg: Optional[Any] = None, priority: Literal['obj', 'arg', 'kws'] = 'obj', **kws) -> Union[NPArray, DataFrame, Any]:
        return cls(priority).get(obj, attr, arg, **kws)

    @classmethod
    def _pobj(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
        return cls.prioritize(obj, attr, arg, 'obj', **kws)

    @classmethod
    def _pargs(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
        return cls.prioritize(obj, attr, arg, 'args', **kws)

    @classmethod
    def _pkws(cls, obj: object, attr: str, arg: Optional[Any] = None, **kws) -> Union[NPArray, DataFrame, Any]:
        return cls.prioritize(obj, attr, arg, 'kws', **kws)

Mixin

@dataclass
class ClassMethodSignatureMixin:
    def get_val(self, attr: str, arg: Optional[Any] = None, prioritize: Union[Literal['obj', 'arg', 'kws'], ClassMethodSignaturePriority] = 'arg', **kws):    
        # by default we will prioritize `arg` over `self` as `arg` might overwrite `self`'s attribute
        # arg --(fallbacks to)--> kws --(fallbacks to)--> self
        priority = ClassMethodSignaturePriority(prioritize)
        return priority.get(self, attr=attr, arg=arg, **kws)

    def _prioritize_kws(self, attr: str, arg: Optional[Any] = None, **kws):
        return self.get_arg(attr, arg, prioritize='kws', **kws)

    def _prioritize_arg(self, attr: str, arg: Optional[Any] = None, **kws):
        return self.get_arg(attr, arg, prioritize='arg', **kws)

    def _prioritize_obj(self, attr: str, arg: Optional[Any] = None, **kws):
        return self.get_arg(attr, arg, prioritize='obj', **kws)
    
    def get_arg(self, attr: str, func: Callable, scope: Dict[str, Any]):
        args = inspect.getfullargspec(func).args
        if attr in args and attr in scope:
            return scope[attr]
        return None
    
    def get_tuple(self, attr: str,func: Callable, scope: Dict[str, Any], **kws) -> Tuple[Any, Any, Any]:
        obj = getattr(self, attr, None)
        arg = self.get_arg(attr, func, scope)
        kwa = kws.get(attr, None)
        return obj, arg, kwa
    
    def update_params(self, **kws):
        params = self.aparams()
        for k, v in self.kwargs:
            v = self.get_val(attr=k, prioritize='kws', **kws)
            params[k] = v
        return params

KeywordArguments

KeywordArgumentsMeta

class KeywordArgumentsMeta(type):    
    @staticmethod
    def get_annots_kws(cls) -> list:
        '''Get annotated keyword only argument names'''
        annots = list(cls.__annotations__.keys())
        if '_' not in annots:
            return []
        return annots[annots.index('_') + 1:] 

    @staticmethod
    def get_cls_kws(cls) -> list:
        '''
        NOTES
        -----
        - if using inheritance this will get all keyword only arguments
        '''
        return inspect.getfullargspec(cls.__init__).kwonlyargs

    @staticmethod
    def attr_dict(obj: object, attrs: list) -> dict:
        return dict((k, getattr(obj, k, None)) for k in attrs)

    @staticmethod
    def inst_dict(inst: object, attr: str = 'defaults'):        
        attrs = getattr(type(inst), attr).items()
        return dict((k, getattr(inst, k, v)) for k, v in attrs)
                
    @property
    def keywords(cls) -> list:        
        '''Get current keyword only argument names'''
        return cls.get_annots_kws(cls)

    @property
    def ikeywords(cls) -> list:
        '''Get inherited keyword only argument names'''
        ignore = cls.keywords
        result = list()
        is_new = lambda kw: kw not in result and kw not in ignore
        for c in inspect.getmro(cls):
            if c is not object:
                new_kws = cls.get_annots_kws(c)
                result.extend(list(filter(is_new, new_kws)))
        return result
    
    @property
    def akeywords(cls) -> list:
        '''Get all keyword only argument names'''
        result = list()
        is_new = lambda kw: kw not in result
        for c in inspect.getmro(cls):
            if c is not object:
                new_kws = cls.get_annots_kws(c)
                result.extend(list(filter(is_new, new_kws)))
        return result
    
    @property
    def defaults(cls) -> dict:
        '''Get default keyword arguments only values'''
        instance = cls()
        return cls.attr_dict(instance, cls.keywords)  

    @property
    def idefaults(cls) -> dict:
        '''Get inherited default keyword arguments only values'''
        instance = cls()
        return cls.attr_dict(instance, cls.ikeywords)

    @property
    def adefaults(cls) -> dict:
        '''Get all default keyword arguments only values'''
        instance = cls()
        return cls.attr_dict(instance, cls.akeywords)

KeywordArgumentsMixin

@dataclass
class KeywordArgumentsMixin(metaclass=KeywordArgumentsMeta):
    _: KW_ONLY

    @property
    def kwargs(self) -> dict:
        '''Get instance specific default keyword arguments only values'''
        return type(self).inst_dict(self, attr='defaults')
        
    @property
    def ikwargs(self) -> dict:
        '''Get instance inherited default keyword arguments only values'''
        return type(self).inst_dict(self, attr='idefaults')

    @property
    def akwargs(self) -> dict:
        '''Get instance all default keyword arguments only values'''
        return type(self).inst_dict(self, attr='adefaults')

    def _merge_kws_to_dict(self, params: dict, **kwargs) -> dict:
        '''Only overwrite values in params with kwargs if key is in params'''
        values = params.copy()
        values.update(dict((k, v) for k, v in kwargs.items() if k in values))
        return values

    def params(self, **kwargs) -> dict:
        '''Get instance default keyword arguments only values but update with kwargs'''
        return self._merge_kws_to_dict(self.kwargs, **kwargs)

    def iparams(self, **kwargs) -> dict:
        '''Get instance inherited keyword arguments only values but update with kwargs'''
        return self._merge_kws_to_dict(self.ikwargs, **kwargs)
        
    def aparams(self, **kwargs) -> dict:
        '''Get instance all default keyword arguments only values but update with kwargs'''
        return self._merge_kws_to_dict(self.akwargs, **kwargs)
    

KeywordArgumentsBaseClass

@dataclass
class KeywordArgumentsBaseClass(KeywordArgumentsMixin, ClassMethodSignatureMixin):
    def prepare_params(self, func: Optional[Callable] = None, scope: Optional[Dict[str, Any]] = None, **kws) -> dict:
        params = self.aparams()

        for k, v in self.akwargs.items():
            
            arg = None
            if func and scope:
                arg = self.get_arg(attr=k, func=func, scope=scope)
            
            v = self.get_val(attr=k, arg=arg, prioritize='arg', **kws)
            params[k] = v

        return params

SciPyLinkage

#| export
@dataclass
class SciPyLinkage(KeywordArgumentsBaseClass):
    _: KW_ONLY
    
    method: LinkageMethod = LinkageMethod.SINGLE
    metric: PDistMetric = PDistMetric.CORRELATION
    optimal_ordering: bool = True

    def __post_init__(self):        
        self.method = LinkageMethod(self.method)
        self.metric = PDistMetric(self.metric)

    def __call__(self, x: NPArray, **kwargs) -> NPArray:
        l_func = sp.cluster.hierarchy.linkage
        params = self.prepare_params(func=l_func, scope=locals(), **kwargs)        
        print('LINKAGE', params)
        # linkage = l_func(x, **params)
        # return linkage
SumNeuron
  • 4,850
  • 5
  • 39
  • 107
  • 1
    sorry. I won't be voting to close, but I think this is not really answerable in the scope of stackoverflow questions (Of couse, someone could always step in to do so). I'd suggest contracting a couple hours of consulting to get briefed into this - as it won't be something easy to track or digest with gratis resources. – jsbueno Aug 10 '23 at 14:10
  • 1
    (a) this all needs to be deleted and replaced with a stubs library, but (b) this question needs to be moved to Code Review from Stack Overflow. – Reinderien Aug 10 '23 at 14:12
  • 1
    Whether you get a useful answer, depends on who sees it, and how much time they are willing to spend reading, thinking, and writing. I agree this looks more like a CR topic, dealing with code style rather than a specific bug. – hpaulj Aug 10 '23 at 14:29
  • This is too open-ended / opinion-based to really have a useful answer or be OT here. – BadZen Aug 27 '23 at 17:10
  • I’m voting to close this question because of user suggestion – SumNeuron Aug 27 '23 at 18:02

0 Answers0