0

I am working on a tv retail dataset in R and wanted to put steps I will need to use repeatedly into a function.

This includes checking the VIF and return it, run the STEP algorithm to determine the best model and then use the result of the STEP and display it.

The major issue is the error message

Error in eval(predvars, data, env) : object 'Hour' not found

which appears to appear in the step() call.

Regression <- function(data, dep_var, features) {
  lin.null = lm(paste(dep_var,'~ 1', sep = ''), data= data)
  lin.full = lm(paste(dep_var,'~', paste(features, collapse='+'), sep = ''), data = data)

  vif(lin.full)

  opt = step(lin.null, scope = list(lower = lin.null, upper = lin.full), direction = "forward")

step_opt = opt$call

stargazer(step_opt, type = 'text')
}

dep_var = 'imp'
feat = c('Hour', 'grp')
paste(dep_var,'~', paste(feat, collapse='+'), sep = '')

Regression(comb_a, 'imp', feat)

The final result should show me the VIF values for each variable and the stargazer output of the STEP optimized regression.

EDIT 1:

comb_a is the input data the regression should take The dput() output follows down below:

# comb_a 
structure(list(Day = structure(c(1483833600, 1483833600, 1483833600, 
1483833600, 1483833600, 1483833600), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9), Model = c("Model A", 
"Model A", "Model A", "Model A", "Model A", "Model A"), tv_count = c(5L, 
8L, 4L, 9L, 11L, 8L), grp_abs = c(55500, 8308, 19026, 12184, 
10141, 113225), grp = c(0.22, 0.03, 0.07, 0.05, 0.04, 0.45), 
    sum_duration = c(150, 240, 120, 270, 330, 240), grp_per_second = c(370, 
    34.6166666666667, 158.55, 45.1259259259259, 30.730303030303, 
    471.770833333333), hours_since = c(NA, 1, 5, 1, 1, 1), camp_count = c(2L, 
    2L, 2L, 2L, 3L, 4L), imp = c(528, 319, 97, 182, 327, 785), 
    clicks = c(28, 15, 6, 13, 29, 53), leads = c(0, 0, 0, 0, 
    0, 1)), .Names = c("Day", "Hour", "Model", "tv_count", "grp_abs", 
"grp", "sum_duration", "grp_per_second", "hours_since", "camp_count", 
"imp", "clicks", "leads"), row.names = c(NA, -6L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = c("Day", "Hour"), drop = TRUE, indices = list(
    0L, 1L, 2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 
1L, 1L), biggest_group_size = 1L, labels = structure(list(Day = structure(c(1483833600, 
1483833600, 1483833600, 1483833600, 1483833600, 1483833600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Hour = c(0, 1, 6, 7, 8, 9)), row.names = c(NA, 
-6L), class = "data.frame", vars = c("Day", "Hour"), drop = TRUE, .Names = c("Day", 
"Hour")))

desired output would be: (Numbers are just for representation)

> vif(lin.full)
          Hour            grp   sum_duration grp_per_second    hours_since     camp_count 
      2.979362       4.981504       2.290328       3.279818       1.013725       1.110823 
           imp         clicks 
      7.471457       9.244811 

> stargazer(step_opt, type = 'text')
===============================================
                        Dependent variable:    
                    ---------------------------
                               leads           
-----------------------------------------------
clicks                       0.005***          
                             (0.0004)          

camp_count                    0.040*           
                              (0.024)          

Constant                      -0.107           
                              (0.098)          

-----------------------------------------------
Observations                    898            
R2                             0.181           
Adjusted R2                    0.179           
Residual Std. Error      0.772 (df = 895)      
F Statistic           98.901*** (df = 2; 895)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01
dobero
  • 71
  • 5
  • 1
    `comb_a` is not defined here so it's not really possible to test and debug this code. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 22 '19 at 15:26
  • @MrFlick thank you for the advice! Added input and output. – dobero Apr 22 '19 at 22:23

0 Answers0