Consider the following dataframe df
in which the feature
column is string of comma separated feature names in a dataset (df
can be potentially large).
index features
1 'f1'
2 'f1, f2'
3 'f1, f2, f3'
I also have a function get_weights
that accepts a comma-separated string of feature names and calculates and returns a list that contains a weight for each given weight. The implementation details are not important and for the sake of simplicity, let's consider that the function returns equal weights for each feature:
import numpy as np
def get_weights(features):
features = features.split(', ')
return np.ones(len(features)) / len(features)
Using pandas, how can I apply the get_weights
on df
and have the results in a new dataframe as below:
index f1 f2 f3
1 1 0 0
2 0.5 0.5 0
3 0.33 0.33 0.33
That is, in the resulting dataframe, the features in df.features
are turned into columns that contain the weight for that feature per row.