Foraward Selection Strategy- Regression with np.arrays()

Question

I have a np array like this,

[[ 1.     ,  2.33,  0.125     , 4.36   ,  0.     ,  0.215  ],
 [ 1.     ,  0.168 , 36.     ,  2.99   ,  0.198  ,  0.6683 ],
 [ 1.     ,  0.55778,  0.     , 21.89   ,  0.    ,  0.895  ],
 [ 1.     ,  1.62864,  0.     , 21.89   ,  0.    ,  0.624  ],
 [ 1.     ,  0.1146 , 20.     ,  6.96   ,  0.    ,  0.464  ],
 [ 1.     ,  0.55778,  0.     , 21.89   ,  0.    ,  0.624  ]]

each column in this array is a column. first column is the intercept value. I am trying a forward selection strategy function to select the features that have lower than 0.05 p-value.

This is what I have so far,

import statsmodels.api as sm

def forward(y, x):

    features = len(x[1])

    for i in range(0,features):
        model = sm.OLS(y,x[:,[i]]).fit()
        pval = model.pvalues

        if pval < 0.05:
           x = np.append(x,x[:,[i]],1) # Here, I want to append it to a new np.array
        else:
            #go back and check next feature 
    return x

I am having trouble appending the lowest p-value into a new array. I looked up creating new arrays online, but it requires dimensions to be initiated. For now, I don't know how many it'll be.

Otherwise, my only option is to keep the feature in x. If I have to keep the feature how can I do that?

You have two approaches: 1. append to a list and convert it to an array, 2. initialize the array the biggest size you think it would take then chop it down. — anishtain4, Oct 11 '18 at 14:03
@anishtain4 I did not understand the 2nd option. Can you elaborate a little more. Sorry, I am fairly new to python — user9431057, Oct 11 '18 at 14:34
say `x` can have at max 10 elements, then `x=np.empty(10)` and add a counter in the loop, then `x=x[:counter]` — anishtain4, Oct 11 '18 at 14:44
@anishtain4 thank you! But still my approach does not give me the correct answer :( — user9431057, Oct 11 '18 at 18:43
Question has actually nothing to do with `machine-learning` - kindly do not spam the tag (removed). — desertnaut, Oct 11 '18 at 21:09
@desertnaut understood. I think that makes sense and thank you for correcting me :) — user9431057, Oct 12 '18 at 14:23

score 0 · Answer 1 · answered Oct 11 '18 at 22:24

0

Other than bad notations, there's a big bug in your code, you are trying to append x to the input value, so you'll have repeated columns. I haven't run this code but it should work fine.

def forward(y, x):

    features = len(x[1])

    x_new=np.empty_like(x)
    j=0
    for i in range(features):
        model = sm.OLS(y,x[:,i]).fit()
        pval = model.pvalues

        if pval < 0.05:
           x_new[:,j] = x[:,i]
           j+=1
    return x_new[:,:j+1]

answered Oct 11 '18 at 22:24

anishtain4

2,342
2
17
21

I have a question, if my first feature has a pvalue, lower than threshold, I need the first and second feature in my second iteration of i . Is that possible? – user9431057 Oct 13 '18 at 00:48

Foraward Selection Strategy- Regression with np.arrays()

1 Answers1