1

I am fairly new to R, though I've programmed in Python and Java a lot. I have searched these questions about using a for loop to run through a list of variables and everyone keeps mentioning to use lapply. I have done that, and my code works in the sense that it gives me the answers, but it doesn't work in the sense that the answers hide important details. Here's my code and some of the output.

> bat <- read.csv(file="mlbTeam2016-B.csv", header=TRUE)
> varlist <- names(bat)[6:32]
> varlist
 [1] "AB.B"    "R.B"     "H.B"     "X2B.B"   "X3B.B"   "HR.B"    "RBI.B"  
 [8] "BB.B"    "SO.B"    "SB.B"    "CS.B"    "AVG.B"   "OBP.B"   "SLG.B"  
[15] "OPS.B"   "IBB.B"   "HBP.B"   "SAC.B"   "SF.B"    "TB.B"    "XBH.B"  
[22] "GDP.B"   "GO.B"    "AO.B"    "GO_AO.B" "NP.B"    "PA.B" 
> lapply(varlist, function(i){
+ var <- eval(parse(text=paste("bat$",i)))
+ cor.test(bat$W, var, alternative="two.sided", method="pearson")
+ })
[[1]]

        Pearson's product-moment correlation

data:  bat$W and var
t = 0.35067, df = 28, p-value = 0.7285
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3013221  0.4164731
sample estimates:
       cor 
0.06612551 


etc

The problem is that each output says data: bat$W and var without telling me which variable it is testing in this step. This is fine, except I have to go back and look up to see what variable this corresponds to. That is better than typing this code in dozens of times, but not ideal. I also know that using eval(parse( is bad, but I can't figure out another way to handle that line.

This is my desired output:

[[1]]

        Pearson's product-moment correlation

data:  bat$W and bat$AB.B
t = 0.35067, df = 28, p-value = 0.7285
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3013221  0.4164731
sample estimates:
       cor 
0.06612551 
Tim Brauch
  • 11
  • 2
  • 3
    Possible duplicate of [Access lapply index names inside FUN](http://stackoverflow.com/questions/9950144/access-lapply-index-names-inside-fun) – manotheshark Mar 31 '17 at 21:48
  • 1
    Why the ugly and inefficient `var <- eval(parse(text=paste("bat$",i)))`? Just try `var<-bat[[i]]` instead. As for your question, use `sapply` instead of `lapply` with the `simplify=FALSE` argument. – nicola Mar 31 '17 at 21:56
  • @nicola Making those two changes seems to have made the code useable now. It now tells me what variable it is testing instead of the [[1]] at the start of the output. – Tim Brauch Mar 31 '17 at 22:10

1 Answers1

0

I would suggest creating a correlation matrix rather than doing this using lapply.

This link will walk you through how to do that http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software

You can select the variables you want using dplyr:

select(bat, one_of(varlist))

This should be a bit easier than the approach you are using.

Ian Wesley
  • 3,565
  • 15
  • 34