0

I have built a function where I want to pass a data frame and a column from the data frame. For example:

testdf <- structure(list(date = c("2016-04-04", "2016-04-04", "2016-04-04", 
"2016-04-04", "2016-04-04", "2016-04-04"), sensorheight = c(1L, 
16L, 1L, 16L, 1L, 16L), farm = c("McDonald", "McDonald", "McDonald", 
"McDonald", "McDonald", "McDonald"), location = c("4", "4", "5", 
"5", "Outside", "Outside"), Temp = c(122.8875, 117.225, 102.0375, 
98.3625, 88.5125, 94.7)), .Names = c("date", "sensorheight", 
"farm", "location", "Temp"), row.names = c(NA, 6L), class = "data.frame")

> testdf
        date sensorheight     farm location     Temp
1 2016-04-04            1 McDonald        4 122.8875
2 2016-04-04           16 McDonald        4 117.2250
3 2016-04-04            1 McDonald        5 102.0375
4 2016-04-04           16 McDonald        5  98.3625
5 2016-04-04            1 McDonald  Outside  88.5125
6 2016-04-04           16 McDonald  Outside  94.7000

The function subtracts some values from others based the values in different columns. It was working, accepting the data frame and column inputs, but since updating R, its not working.

DailyInOutDiff <- function (df, variable) {

  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)
}

test <- DailyInOutDiff(testdf, "Temp")
test <- DailyInOutDiff(testdf, quote(Temp))

They produce the following error messages:

  Error in summarise_impl(.data, dots) : 
  Evaluation error: non-numeric argument to binary operator. 

And

  Error in summarise_impl(.data, dots) : 
  Evaluation error: object of type 'symbol' is not subsettable. 

I would like to know the meaning of these error messages and how to address them.

I tried these solutions Pass a data.frame column name to a function, however none of the solutions worked for me.

The errors do not occur if I remove the column as an input, but I need the column because I am applying the function to multiple columns in a large data frame.

The output I would like:

        date sensorheight     farm location     Temp
1 2016-04-04            1 McDonald        4  34.3750
2 2016-04-04           16 McDonald        4  22.5250
3 2016-04-04            1 McDonald        5  13.5250
4 2016-04-04           16 McDonald        5   3.6625
phaser
  • 565
  • 1
  • 11
  • 28

3 Answers3

2

I couldn't replicate the second error, but I could replicate the first one. It seems that the summarise function has trouble calling Temp, because it considers it to be a character object. In other words, you are calling a column name, and not a column. If you run the code inside the function line-by-line, and instead of variable you use df$variable you will see that it works.

That being said, the solution is pretty simple. I just added the line variable<- as.name(variable) in your function. Now it reads:

DailyInOutDiff <- function (df, variable) {

  variable<- as.name(variable)
  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)
}

And the output is:

> test <- DailyInOutDiff(testdf, "Temp")
> test
Source: local data frame [4 x 5]
Groups: date, sensorheight [2]

        date sensorheight     farm location    Diff
       <chr>        <int>    <chr>    <chr>   <dbl>
1 2016-04-04            1 McDonald        4 34.3750
2 2016-04-04           16 McDonald        4 22.5250
3 2016-04-04            1 McDonald        5 13.5250
4 2016-04-04           16 McDonald        5  3.6625
Yannis Vassiliadis
  • 1,719
  • 8
  • 14
1

If you're using the latest dplyr (0.7) you can use .data to refer to the column name with a string, Your function would be modified as such:

DailyInOutDiff <- function (df, variable) {

  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else .data[[variable]][location=="4"] - .data[[variable]][location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else .data[[variable]][location=="5"] - .data[[variable]][location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)
}

The change from variable[...] to .data[[variable]][...] means it now selects the column specified by the string in variable, instead of trying to index the actual string. Running this function with the provided data returns:

DailyInOutDiff(testdf, "Temp")
#> # A tibble: 4 x 5
#> # Groups:   date, sensorheight [2]
#>         date sensorheight     farm location    Diff
#>        <chr>        <int>    <chr>    <chr>   <dbl>
#> 1 2016-04-04            1 McDonald        4 34.3750
#> 2 2016-04-04           16 McDonald        4 22.5250
#> 3 2016-04-04            1 McDonald        5 13.5250
#> 4 2016-04-04           16 McDonald        5  3.6625
beigel
  • 1,190
  • 8
  • 14
0

The following calls the function DailyInOutDiff and assigns testdf to df and "Temp" to variable.

   test <- DailyInOutDiff(testdf, "Temp")
   test <- DailyInOutDiff(testdf, quote(Temp))

As per what you are trying to do , you want to pass a data frame and a column from the data frame. At present you are passing only the column name which is a string and not the column. You must change it to

      test <- DailyInOutDiff(testdf, testdf["Temp"])

Secondly, you are passing Temp column and trying to filter the variable dataframe based on location in the following piece of code.

summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], location = "4")

It must be,

    variable[variable$location=="4",] 

if your call is,

    test <- DailyInOutDiff(testdf, testdf["Temp"]) 

or

   variable[variable$Temp=="4",] 

if you call is,

    test <- DailyInOutDiff(testdf, testdf["Temp"]) 
Linda
  • 627
  • 4
  • 14