In Python, is it possible to unpack a dict of keyword args in the definition of a function? As far as I can see, it is not possible because there are two independent definitions of double star syntax. Unpacking is only possible when a function is being called, and never when it is being defined. Is this true? If so, is there a way around it to accomplish something similar to what I want to do? In other words, can I override this behavior?
There are two uses of double stars **
. One, the **
can be used to pass a dict (and unpack it) to a function. Two, the **kwargs
can be used when defining a function to indicate an unspecified number of keyword arguments. As far as I can tell, these are two completely independent (though logically consistent) definitions of **
.
A detailed description is here:
What does ** (double star) and * (star) do for Python parameters?
Simple examples of each.
def print_args(**kwargs):
print kwargs
print_args(one='this', two='that')
# {'two': 'that', 'one': 'this'}
def print_kw(one=None, two=None):
print one; print two
print_kw(**{'one':'this', 'two':'that'})
# this
# that
What I'd like to do is:
packed_keywords = {'apple':'red', 'peach':'fuzzy'}
def print_unpacked_kw(**packed_keywords):
print apple; print peach
print_unpacked()
# NameError: global name 'apple' is not defined
# I'd like this to print out "red fuzzy"
For comparison, here's an example of similar code, without unpacking. This version works, but without using a dict for the keyword args as I desire.
def print_typed_kw(apple='red', peach='fuzzy'):
print apple; print peach
print_typed_kw()
# red
# fuzzy
EDIT: Why do I want to do this?
CONTEXT:
This explanation is highly specific to scikit-learn. If you are not familiar with this library, it may be best to ignore the rest of this context section. This question comes up in the context of writing a transformer class that will go inside a pipeline. Specifically, I am creating a transformer that will return a prediction from a regressor. My idea is to use this prediction as one feature in a Feature Union that will go into another downstream classifier.
One of the benefits of a pipeline is to set parameters within a Grid Search to optimize the hyper-parameters. In my experience, it is only possible to access the parameters of a user defined function this way if the parameters are defined as arguments in the __init__
constructor of the estimator class. Here is my class:
class RandForestTransformer(BaseEstimator, TransformerMixin):
"""
Takes a random forest (or could be any classifier) and uses
predict as output for transform, which can then be used as
a feature in another FeatureUnion and classifier.
"""
def __init__(self,
n_estimators=10, criterion='mse',
max_depth=None, min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.0,
max_features='auto', max_leaf_nodes=None,
bootstrap=True, oob_score=False, n_jobs=1,
random_state=None, verbose=0, warm_start=False):
self.rf = RandomForestRegressor()
def fit(self, X, y):
self.rf = self.rf.fit(X, y)
return self
def transform(self, X, y=None):
return self.rf.predict(X)
I'd like to be able to pass a dict to the __init__
definition so that I can easily change individual parameters without redefining the entire class every time.
EDIT 2:
In regard to my specific problem, I'd like to give credit to @j-a for suggesting looking at the scikit-learn BaseEstimator code.
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py
The class definition for BaseEstimator explicitly states that the parameters must be given in __init__
where it is introspected for pipeline uses.
class BaseEstimator(object):
"""Base class for all estimators in scikit-learn
Notes
-----
All estimators should specify all the parameters that can be set
at the class level in their ``__init__`` as explicit keyword
arguments (no ``*args`` or ``**kwargs``).
"""