Making pandas play nice with scikit-learn

Question

This is somewhat of a theoretical question. I know SO doesn't like code that isn't easily replicated, but please bear with me!

I've got a pandas DataFrame that I want to run a Lasso regression on. To do so, the best way I know of is getting the features into a numpy array:

    features = df[list(cols)].values
    features = np.nan_to_num(features)

Then I do the sk-learn magic:

    lasso_model = LassoCV(cv = 15, copy_X = True, normalize = True, max_iter=10000)
    lasso_fit = lasso_model.fit(features, label)
    lasso_path = lasso_model.score(features, label)
    print lasso_model.coef_

Now my problem is how to efficiently make pandas and numpy work together. This print shows something like:

array([  1.69066749e-05,  -1.56013346e-05,   0.00000000e+00,
        -6.77086687e-06,   0.00000000e+00,   3.95920932e-08,
         0.00000000e+00,   6.54752484e-06,  -0.00000000e+00,
        -1.18676617e-05,  -7.36411973e-08,   4.72966581e-05,
         2.91028626e-06,   1.60674178e-05,   8.83195041e-06,
        -8.74769447e-02,   1.39914995e-04,  -1.86801467e-05,
         3.68593473e-01,   4.16009393e-01,   9.27391598e-07,
        -0.00000000e+00,   0.00000000e+00,  -4.07446333e-03,
         2.33648787e-01,   0.00000000e+00,   2.22660872e-02,
         0.00000000e+00,   3.04366897e-02,  -0.00000000e+00,
         0.00000000e+00,  -0.00000000e+00,  -0.00000000e+00,
         1.85141334e-01,   9.50727274e-02,  -4.94268994e-03,
         2.22993839e-01,   0.00000000e+00,   1.23715861e-02,
         0.00000000e+00,   5.42142052e-02,  -1.27412757e-02,
         2.98389804e-02,   1.35957828e-02,  -0.00000000e+00,
         3.64953613e-02,  -0.00000000e+00,   1.03289810e-01,])

This does me no good. How do I get what coefficients are for what columns in an efficient manner?

I have found some hack-y ways to do some of it, but I'm thinking there is a much better way that I could do this.

For example, I know I can do the max by:

In [256]: coef.argmax()
Out[256]: 19

In [257]: cols[19]
Out[257]: 'Price'

I think the main thing I'm wondering is how to get a dictionary of column name to coefficient pairs.

Thanks guys!

score 4 · Accepted Answer · edited May 23 '17 at 10:33

4

You can make a dictionary which maps cols to coefs like this:

dict(zip(cols, coef))

This is a common pure Python idiom.

edited May 23 '17 at 10:33

Community

1
1

answered Oct 02 '14 at 21:28

unutbu

842,883
184
1,785
1,677

would it not be `coef[0]`, as it is an array of array of `(n_classes, n_features)`? – sapo_cosmico Dec 10 '15 at 12:57

Making pandas play nice with scikit-learn

1 Answers1