How can I override the meaning of **kwargs in a function definition so that it instead will unpack a dict into default arguments of the function def?

Question

In Python, is it possible to unpack a dict of keyword args in the definition of a function? As far as I can see, it is not possible because there are two independent definitions of double star syntax. Unpacking is only possible when a function is being called, and never when it is being defined. Is this true? If so, is there a way around it to accomplish something similar to what I want to do? In other words, can I override this behavior?

There are two uses of double stars **. One, the ** can be used to pass a dict (and unpack it) to a function. Two, the **kwargs can be used when defining a function to indicate an unspecified number of keyword arguments. As far as I can tell, these are two completely independent (though logically consistent) definitions of **.
A detailed description is here:
What does ** (double star) and * (star) do for Python parameters?

Simple examples of each.

def print_args(**kwargs):
    print kwargs   

print_args(one='this', two='that')
# {'two': 'that', 'one': 'this'}

def print_kw(one=None, two=None):
    print one; print two 

print_kw(**{'one':'this', 'two':'that'})
# this
# that

What I'd like to do is:

packed_keywords = {'apple':'red', 'peach':'fuzzy'}
def print_unpacked_kw(**packed_keywords):
    print apple; print peach

print_unpacked()
# NameError: global name 'apple' is not defined 
# I'd like this to print out "red fuzzy"

For comparison, here's an example of similar code, without unpacking. This version works, but without using a dict for the keyword args as I desire.

def print_typed_kw(apple='red', peach='fuzzy'):     
    print apple; print peach

print_typed_kw()
# red
# fuzzy

EDIT: Why do I want to do this?

CONTEXT:
This explanation is highly specific to scikit-learn. If you are not familiar with this library, it may be best to ignore the rest of this context section. This question comes up in the context of writing a transformer class that will go inside a pipeline. Specifically, I am creating a transformer that will return a prediction from a regressor. My idea is to use this prediction as one feature in a Feature Union that will go into another downstream classifier.

One of the benefits of a pipeline is to set parameters within a Grid Search to optimize the hyper-parameters. In my experience, it is only possible to access the parameters of a user defined function this way if the parameters are defined as arguments in the __init__ constructor of the estimator class. Here is my class:

class RandForestTransformer(BaseEstimator, TransformerMixin):
    """
    Takes a random forest (or could be any classifier) and uses 
    predict as output for transform, which can then be used as
    a feature in another FeatureUnion and classifier. 
    """
    def __init__(self, 
                 n_estimators=10, criterion='mse', 
                 max_depth=None, min_samples_split=2, 
                 min_samples_leaf=1, min_weight_fraction_leaf=0.0, 
                 max_features='auto', max_leaf_nodes=None, 
                 bootstrap=True, oob_score=False, n_jobs=1, 
                 random_state=None, verbose=0, warm_start=False): 
        self.rf = RandomForestRegressor()  
    def fit(self, X, y):
        self.rf = self.rf.fit(X, y)
        return self
    def transform(self, X, y=None):
        return self.rf.predict(X)

I'd like to be able to pass a dict to the __init__ definition so that I can easily change individual parameters without redefining the entire class every time.

EDIT 2:
In regard to my specific problem, I'd like to give credit to @j-a for suggesting looking at the scikit-learn BaseEstimator code.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py

The class definition for BaseEstimator explicitly states that the parameters must be given in __init__ where it is introspected for pipeline uses.

class BaseEstimator(object):
    """Base class for all estimators in scikit-learn
    Notes
    -----
    All estimators should specify all the parameters that can be set
    at the class level in their ``__init__`` as explicit keyword
    arguments (no ``*args`` or ``**kwargs``).
    """

I don't think you actually should use unpacking in your function *definition*. Python is flexible enough that you can hack your way around, but basically, don't. — , Apr 06 '16 at 08:45
Without dynamically building the function object at runtime, I don't see a way you could do this. I'm also not sure why you'd want to! What's the actual *problem* you're trying to solve? — jonrsharpe, Apr 06 '16 at 08:46
@jonrsharpe, my problem relates to building a scikit-learn class that can flexibly change keyword arguments in the constructor. Details added above. — Walter, Apr 06 '16 at 08:59

score 2 · Accepted Answer · answered Apr 06 '16 at 09:09

You are trying to apply the catch-all **kwargs argument into your locals namespace. You can't do this due to optimisation limitations in Python; locals are actually a C array where the interpreter looks up variables by index (the bytecode produced for functions uses indices, not strings, to refer to locals).

So no, you can't do this. And you don't need to, because your function body is not dynamic. You have to refer to apple and peach in your function body anyway, so if you need access to more keyword parameters, you'd have to update that code; there is no difference here when updating the body and the function argument list.

In your wider context, your RandForestTransformer.__init__ method doesn't use any of the keyword arguments, so there is no point in defining all those names. It could be that scikit-learn is using introspection on that method to see what variables the pipeline uses, but if that is the case, then replacing the keyword argument list with **kwargs is not going to work either, as that'll take away the one source for introspection.

Your last paragraph is correct. I now see that the BaseEstimator class is using introspection. Interestingly, for this particular application, I don't need to use the variables in the function body. I'm not sure why this works, but it may have to do with how sci-kit is using introspection under these circumstances. — Walter, Apr 06 '16 at 09:47

How can I override the meaning of **kwargs in a function definition so that it instead will unpack a dict into default arguments of the function def?

1 Answers1