I have a np
array like this,
[[ 1. , 2.33, 0.125 , 4.36 , 0. , 0.215 ],
[ 1. , 0.168 , 36. , 2.99 , 0.198 , 0.6683 ],
[ 1. , 0.55778, 0. , 21.89 , 0. , 0.895 ],
[ 1. , 1.62864, 0. , 21.89 , 0. , 0.624 ],
[ 1. , 0.1146 , 20. , 6.96 , 0. , 0.464 ],
[ 1. , 0.55778, 0. , 21.89 , 0. , 0.624 ]]
each column in this array is a column. first column is the intercept
value. I am trying a forward selection strategy function to select the features that have lower than 0.05 p-value.
This is what I have so far,
import statsmodels.api as sm
def forward(y, x):
features = len(x[1])
for i in range(0,features):
model = sm.OLS(y,x[:,[i]]).fit()
pval = model.pvalues
if pval < 0.05:
x = np.append(x,x[:,[i]],1) # Here, I want to append it to a new np.array
else:
#go back and check next feature
return x
I am having trouble appending the lowest p-value into a new array. I looked up creating new arrays online, but it requires dimensions to be initiated. For now, I don't know how many it'll be.
Otherwise, my only option is to keep the feature in x
. If I have to keep the feature how can I do that?