Questions tagged [variable-selection]

35 questions
5
votes
3 answers

Obtain variable selection order from glmnet

I've been using the glmnet R package to build a LASSO regression model for one target variable Y (numeric) and 762 covariates. I use the glmnet() function and then coef(fit, s = 0.056360) to get the coefficient values for that specific value of…
Roman
  • 51
  • 3
2
votes
0 answers

Variable selection involving mixture of numerical, high cardinal,low cardinal features

Consider a dummy dataframe: A B C D …. Z 1 2 as we 2 2 4 qq rr 5 4 5 tz rc 9 This dataframe has 25 independent variables and one target variable ,the independent variables are mixture of high cardinal features, numerical features and low…
1
vote
1 answer

How to use CAST package for a shapefile (polygons) in R?

any help with the following is really appreciated!! My goal: I need to run a lasso model for variable selection for my data (which is in sf polygon format). My data: As said above, is a sf object. Specifically, is a shapefile with polygons. I have…
1
vote
0 answers

Why am I getting error "numbers of columns of arguments do not match" for some model specifications but not others?

I'm trying to use the caret function rfe for variable selection. One of the models I would like to use it on is a logistic regression. I'm trying to follow the example here:…
canderson156
  • 1,045
  • 10
  • 24
1
vote
1 answer

How to capture the most important variables in Bootstrapped models in R?

I have several models that I would like to compare their choices of important predictors over the same data set, Lasso being one of them. The data set I am using consists of census data with around a thousand variables that have been renamed to…
1
vote
1 answer

How do I model this linear programming problem in Python?

I have been tasked to program a Dantzig Selector using Python, but I was given no guidelines and do not have much experience in linear programming or data science. I cannot find the information I need in LP module manuals, or in other questions on…
1
vote
1 answer

How can I add t- statistics value in all the selection models of stepwise selection in proc reg in sas?

proc reg data= outest=regout; model =/selection-stepwise sle=0.1 sls= 0.05 details=all vif; run; Using above code in SAS produces 3 tables(since stepwise did not drop any variable) at each step: Statistics for…
Prem Keshri
  • 61
  • 1
  • 1
  • 3
1
vote
1 answer

How to use user input to access a variable?

I am creating a program in which I would like to prompt users for NYC boroughs, and use a general GPS coordinate of said borough. Part of my code for this is df.loc[0,'BOROUGH'] = input('Borough:\n') manhattan = [40.7831, -73.9712] brooklyn =…
Yehuda
  • 1,787
  • 2
  • 15
  • 49
1
vote
1 answer

glmnet produces an error while using prediction

I seem to have a problem with glmnet. I want to run a regular LASSO regression to understand which of 10 variables (Dim1, Dim2...) contribute the most to predict my continuous variable ptScores. All variables are continuous, validInd is a…
Mayan
  • 13
  • 4
1
vote
1 answer

Variable selection in Random forest and prediction accuracy

I have a cross-section data set repeated for 2 years, 2009 and 2010. I am using the first year (2009) as a training set to train a Random Forest for a regression problem and the second year (2010) as a test set. Load the data df <-…
et_
  • 179
  • 8
1
vote
1 answer

Using AIC for variable selection and to evaluate criterion in Multiple Regression

i am fairly new to R and Python. I like to perform multiple regression using Akaike Information Criterion for variable selection and to evaluate my criterion. I have written some code to select my variables using the F Statistic P value. The dataset…
wjie08
  • 433
  • 2
  • 11
1
vote
1 answer

How can I apply extreme bounds analysis to a dataset of over 100 variables with the ExtremeBounds package in R?

I have a dataset consisting of 107 variables with 1794 observations. I want to implement Extreme Bounds Analysis in order to determine which of the 106 variables are robustly correlated with the dependent variable throughout a wide range of…
Werther
  • 133
  • 7
0
votes
0 answers

LassoCV results depend on the number of inputvariables?

I want to perform variable selection using Lasso regression, as I am not sure how many (lagged) variables X still have an effect on my variable y. However, the resulting model, and also which variables end up being zero, is different for different…
0
votes
0 answers

Implementing Random-Forest or Lasso with a spatio-temporal shapefile in R?

Is there a way to implement either a Random-Forest or Lasso model (for variable selection) in R with a pooled spatio-temporal dataset? Data: My data is a pooled spatial dataset for multiple years. Specifically, it is a polygon with information of…
0
votes
1 answer

Variable selection in big data

I am trying to build a regression model for big data with 220 variables. The 220 variables have binary values with values as zero and one. Some variables are correlated (not highly correlated). Also, some of the variables have 60% or more of their…
1
2 3