1

Not able to pass variable names correctly in for loop or use lapply functions.

When I try this command without loop/laaply it works and I get values:

> boxplot.stats(df$price)$out

 [1] 38.7 43.8 41.3 50.0 50.0 50.0 50.0 37.2 39.8 37.9 50.0
[12] 50.0 42.3 48.5 50.0 44.8 50.0 37.6 46.7 41.7 48.3 42.8
[23] 44.0 50.0 43.1 48.8 50.0 43.5 45.4 46.0 50.0 37.3 50.0
[34] 50.0 50.0 50.0 50.0

But when I put this under a lapply or for-loop then I get Null, why ?

df_numeric_names <- names(select_if(df, is.numeric))
df_numeric_names

[1] "price"       "resid_area"  "air_qual"    "room_num"    "age"         "dist1"       "dist2"       "dist3"      
 [9] "dist4"       "teachers"    "poor_prop"   "n_hos_beds"  "n_hot_rooms" "rainfall"    "parks"       "Sold" 

loop

for (feature in df_numeric_names){
  outlier_values <- boxplot.stats(df$feature)$out
  print(outlier_values)
}

 - Output:

NULL
NULL
NULL

lapply

lapply(df_numeric_names, function(x) {
  boxplot.stats(df$x)$out
  
})

 - output
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL


This is a fairly simple thing but I am not sure what am I doing wrong and how do I fix.

ViSa
  • 1,563
  • 8
  • 30

1 Answers1

1

This slight change in the loop could solve your issue:

for (feature in df_numeric_names){
  outlier_values <- boxplot.stats(df[,feature])$out
  print(outlier_values)
}

And a little example:

library(dplyr)
#Data
data("iris")
df <- iris
#Numeric names
df_numeric_names <- names(select_if(df, is.numeric))
#Loop
for (feature in df_numeric_names){
  outlier_values <- boxplot.stats(df[,feature])$out
  print(outlier_values)
}

The output:

numeric(0)
[1] 4.4 4.1 4.2 2.0
numeric(0)
numeric(0)

Also using lapply() you should use a code similar to this:

lapply(df_numeric_names, function(x) {
  boxplot.stats(df[,x])$out
})

Output:

[[1]]
numeric(0)

[[2]]
[1] 4.4 4.1 4.2 2.0

[[3]]
numeric(0)

[[4]]
numeric(0)
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thanks @Duck for putting the example as well and it worked but I still don't understand why is sub setting df[, feature] will work but not selecting the feature itself . These things in R becoming abit confusing. Thanks anyways, will accept your answer once they let me. – ViSa Aug 26 '20 at 14:45
  • got more explanation from https://stackoverflow.com/questions/18222286/dynamically-select-data-frame-columns-using-and-a-character-value. – ViSa Aug 26 '20 at 14:49
  • Yeah nice! Sometimes using `$` operator can show troubles because it need to find the name inside the dataframe. That is why sometimes is better to use the brackets as seen in the examples. If would be great if you accept this answer :) – Duck Aug 26 '20 at 14:51