-1

I am working through Datacamp's Intro to R course, but I do not understand why this code works:

# Define columns
columns <- c("trip_distance", "total_amount", "passenger_count")

# Create summary function
taxis_summary <- function(col, data = taxis) {
  c(
    mean = mean(data[[col]]), 
    sd = sd(data[[col]]),
    quantile(data[[col]], c(0.25, 0.5, 0.75))
    )
}

# Use sapply to summarize columns
sapply(columns, taxis_summary)

but this code throws a:

Unknown or uninitalised column: 'col'. Argument is not numeric or logical: returning NA

# Define columns
columns <- c("trip_distance", "total_amount", "passenger_count")

# Create summary function
taxis_summary <- function(col, data = taxis) {
  c(
    mean = mean(data$col), 
    sd = sd(data$col),
    quantile(data$col, c(0.25, 0.5, 0.75))
    )
}

# Use sapply to summarize columns
sapply(columns, taxis_summary)
nick
  • 1,090
  • 1
  • 11
  • 24
Hank Lin
  • 5,959
  • 2
  • 10
  • 17
  • When you write `data$col` R searches for `col` in `data`. It does not "know" to evaluate `col` first. This was a difficult transition for me because I used Stata before coming to R and this type of "local macro" substitution is a big part of programming with Stata. – DanY Aug 22 '18 at 21:46
  • Not that it doesn't "know" to evaluate `col` first, but it's not told to. The way to tell it is to use `[[` instead of `$` :-) – lebatsnok Aug 22 '18 at 21:52
  • You can use `$` the way you want but it's more complicated. Let's have a simple data frame, `df <- data.frame(a=1,b=2,c=3)`, and a vector indicating the column we want to extract, `col <- "a"`. You can do `df$a` but equivalently, `as.call(c(as.name("$"), as.name("df"), col))` will do it as well. Umm, sorry, you have to evaluate it as well, so it becomes `eval(as.call(c(as.name("$"), as.name("df"), col)))`. – lebatsnok Aug 22 '18 at 22:20
  • So if you insist on using `$` with `lapply` here (which would be instructive), you can easily do it: `col <- c("a", "b", "c")`, and `lapply(col, function(x) eval(as.call(c(as.name("$"), as.name("df"), x))))`. – lebatsnok Aug 22 '18 at 22:23

1 Answers1

1

There are various ways to access elements in dataframes. This is an issue with the way R is looking for the column names you want it to find.

One way is what datacamp showed, using data[[col]]. Another is the $ accessor, as in data$col. The latter does not substitute variables from functions on the fly. It's looking for a column literally called "col", and the error is reporting that it found no such column. On the other hand, the way datacamp accesses these columns, it was able to find "trip_distance", "total_amount", and "passenger_count".