1

I have a dataframe and I am trying to manipulate the results with dplyr for practice. My standard approach works, however whenever I try to run it wrapped in a function, I get Error: Result must have length 1, not 12. I should be getting two print outs. Here is a reproducible example:

library(tidyverse)
library(dplyr)
dat <- data.frame(Class= c("3rd","First","2nd","3rd","First","2nd","3rd","First","2nd","3rd","First","2nd"),
                  Sex= c("Male","Male","Male","Female","Female","Female","Male","Male","Male","Female","Female","Female"),
                  Age= c("Child","Child","Child","Child","Child","Child","Adult","Adult","Adult","Adult","Adult","Adult"),
                  Survived= c("No","No","Yes","No","Yes","No","Yes","Yes","No","Yes","No","Yes"))

multiple_col_selection2 <- function(data, sex_var, age_var){

  data %>%
    group_by(data[,4],data[,2],data[,3])%>%
    filter(.[[2]]== sex_var & .[[3]]== age_var) %>%
    count() %>%
    ungroup()%>%
    add_row({{Sex}} = "Total",  n= sum(.$n)) -> dataset

    paste0(round(dataset$n[1] * 100/dataset$n[3], 2), "% NOT survived.")
    paste0(round(dataset$n[2] * 100/dataset$n[3], 2), "% survived.")
}

multiple_col_selection2(dat,"Female","Adult") #Error: Result must have length 1, not 12

#Whereas if I do it standalone, it works
ex_dat <- dat %>%
          group_by(Sex, Age, Survived)%>%
          filter(Sex== "Female" & Age== "Adult") %>%
          count()%>%
          ungroup()%>%
          add_row(Sex = "Total",  n= sum(.$n))
paste0(round(ex_dat$n[1] * 100/ex_dat$n[3], 2), "% NOT survived.")
#[1] "33.33% NOT survived."
paste0(round(ex_dat$n[2] * 100/ex_dat$n[3], 2), "% survived.")
#[1] "66.67% survived."

I have read these posts here: Wrapping dplyr filter in function results in "Error: Result must have length 4803, not 3"
Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?
However my approach is different than the approaches in these links. This is my first time using dplyr.

halfer
  • 19,824
  • 17
  • 99
  • 186
WannabeSmith
  • 435
  • 4
  • 18
  • 2
    `data %>% group_by(data[,4])` is bad. If you really want to use column numbers, [use `group_by_at` as shown here](https://stackoverflow.com/a/46436716/903061), e.g., `data %>% group_by_at(.vars = c(4, 2, 3))`. – Gregor Thomas Mar 11 '20 at 16:23
  • Hi Gregor, thank you! I had started with the column names, but errors lead me to pick the column position instead :( – WannabeSmith Mar 11 '20 at 16:27
  • 1
    well, `mtcars %>% group_by_at(.vars = c(1, 2, 3))` works for me, grouping by multiple columns. Column names are strongly preferred though. – Gregor Thomas Mar 11 '20 at 16:28
  • 2
    Your second link using `filter_` and `lazyeval` is extremely dated---ignore it. Have you read the current [Programming with dplyr vignette](https://dplyr.tidyverse.org/articles/programming.html)? – Gregor Thomas Mar 11 '20 at 16:29
  • 2
    You seem to be mixing 3 approaches currently... pick one. `{{Sex}}` would work if you were passing `Sex` in as a column name, e.g., `Sex = Sex` in the function call. Or just use `Sex` if you know that's going to be the column name... – Gregor Thomas Mar 11 '20 at 16:31
  • Hi Mr. Thomas, thank you for the links, errors are gone. But I will have to wait two days before I pick an answer. So I will edit the post :) Appreciate your time sir! – WannabeSmith Mar 11 '20 at 16:42

1 Answers1

1

Thank you @Gregor Thomas for helping me out with errors and giving directions.

multiple_col_selection2 <- function(data, sex_var, age_var){

  data %>%
    group_by_at(.vars = c(2, 3,4))%>%
    filter(Sex== sex_var & Age== age_var) %>%
    count() %>%
    ungroup()%>%
    add_row(Sex = "Total",  n= sum(.$n)) -> dataset

    paste0(round(dataset$n[1] * 100/dataset$n[3], 2), "% NOT survived.", "And ", round(dataset$n[2] * 100/dataset$n[3], 2), "% survived.")
}

multiple_col_selection2(dat,"Female","Adult") #Error resolved
#[1] "33.33% NOT survived. And 66.67% survived."

halfer
  • 19,824
  • 17
  • 99
  • 186
WannabeSmith
  • 435
  • 4
  • 18