0

I have a np array like this,

[[ 1.     ,  2.33,  0.125     , 4.36   ,  0.     ,  0.215  ],
 [ 1.     ,  0.168 , 36.     ,  2.99   ,  0.198  ,  0.6683 ],
 [ 1.     ,  0.55778,  0.     , 21.89   ,  0.    ,  0.895  ],
 [ 1.     ,  1.62864,  0.     , 21.89   ,  0.    ,  0.624  ],
 [ 1.     ,  0.1146 , 20.     ,  6.96   ,  0.    ,  0.464  ],
 [ 1.     ,  0.55778,  0.     , 21.89   ,  0.    ,  0.624  ]]

each column in this array is a column. first column is the intercept value. I am trying a forward selection strategy function to select the features that have lower than 0.05 p-value.

This is what I have so far,

import statsmodels.api as sm

def forward(y, x):

    features = len(x[1])

    for i in range(0,features):
        model = sm.OLS(y,x[:,[i]]).fit()
        pval = model.pvalues

        if pval < 0.05:
           x = np.append(x,x[:,[i]],1) # Here, I want to append it to a new np.array
        else:
            #go back and check next feature 
    return x

I am having trouble appending the lowest p-value into a new array. I looked up creating new arrays online, but it requires dimensions to be initiated. For now, I don't know how many it'll be.

Otherwise, my only option is to keep the feature in x. If I have to keep the feature how can I do that?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
user9431057
  • 1,203
  • 1
  • 14
  • 28

1 Answers1

0

Other than bad notations, there's a big bug in your code, you are trying to append x to the input value, so you'll have repeated columns. I haven't run this code but it should work fine.

def forward(y, x):

    features = len(x[1])

    x_new=np.empty_like(x)
    j=0
    for i in range(features):
        model = sm.OLS(y,x[:,i]).fit()
        pval = model.pvalues

        if pval < 0.05:
           x_new[:,j] = x[:,i]
           j+=1
    return x_new[:,:j+1]
anishtain4
  • 2,342
  • 2
  • 17
  • 21
  • I have a question, if my first feature has a pvalue, lower than threshold, I need the first and second feature in my second iteration of i . Is that possible? – user9431057 Oct 13 '18 at 00:48