0

I am trying to write a generic function for univariate analysis on R for categorical variables. I can pass variables to dplyr but its not working for ggplot code.

Here my code -

 univariate_catogrical <- function(dataset,variable){
  variable <- enquo(variable)

  percentage <- dataset %>%
    select(!!variable) %>%
    group_by(!!variable) %>%
    summarise(n = n()) %>%
    mutate(percantage = (n / sum(n)) * 100)
  print(percentage)

  dataset %>%
    count(!!variable) %>%
    ggplot(mapping = aes_(x = rlang::quo_expr(!!variable), 
                          y = n, fill = rlang::quo_expr(!!variable))) +
    geom_bar(stat = 'identity',
             colour = 'white') +
    labs(x = "Reason.for.absence" , y = "count") + 
    ggtitle(" Count of Reason for absence") +
    theme(legend.position = "bottom") -> p
  plot(p)

}

When I am executing the above function I am getting

> univariate_catogrical(employee_data_Imputed,Reason.for.absence)
# A tibble: 28 x 3
   Reason.for.absence     n percantage
   <fct>              <int>      <dbl>
 1 1                     16      2.23 
 2 2                      1      0.139
 3 3                      1      0.139
 4 4                      2      0.279
 5 5                      3      0.418
 6 6                      7      0.975
 7 7                     15      2.09 
 8 8                      6      0.836
 9 9                      4      0.557
10 10                    23      3.20 
# ... with 18 more rows
 Hide Traceback

 Rerun with Debug
 Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `variable` is unknown 

Can some one please suggest how to fix it. I am use ase_ function to pass the arguments.

Please find the reproducible example.

dput(head(employee_data_Imputed,8))
structure(list(ID = structure(c(11L, 36L, 3L, 7L, 11L, 10L, 20L, 
14L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", 
"32", "33", "34", "35", "36"), class = "factor"), Reason.for.absence = structure(c(26L, 
20L, 23L, 7L, 23L, 22L, 23L, 19L), .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28"), class = "factor"), Month.of.absence = structure(c(7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12"), class = "factor"), Day.of.the.week = structure(c(2L, 
2L, 3L, 4L, 4L, 5L, 5L, 1L), .Label = c("2", "3", "4", "5", "6"
), class = "factor"), Seasons = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("1", "2", "3", "4"), class = "factor"), 
    Transportation.expense = c(289, 118, 179, 279, 289, 361, 
    260, 155), Distance.from.Residence.to.Work = c(36, 13, 51, 
    5, 36, 52, 50, 12), Service.time = c(13, 18, 18, 14, 13, 
    3, 11, 14), Age = c(33, 50, 38, 39, 33, 28, 36, 34), Work.load.Average.day = c(239554, 
    239554, 239554, 239554, 239554, 239554, 239554, 239554), 
    Hit.target = c(97, 97, 97, 97, 97, 97, 97, 97), Disciplinary.failure = structure(c(1L, 
    2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), 
    Education = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
    "2", "3", "4"), class = "factor"), Son = c(2, 1, 0, 2, 2, 
    1, 4, 2), Social.drinker = structure(c(2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), Social.smoker = structure(c(1L, 
    1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), 
    Pet = c(1, 0, 0, 0, 1, 4, 0, 0), Weight = c(90, 98, 89, 68, 
    90, 80, 65, 95), Height = c(172, 178, 170, 168, 172, 172, 
    168, 196), Body.mass.index = c(30, 31, 31, 24, 30, 27, 23, 
    25), Absenteeism.time.in.hours = c(4, 0, 2, 4, 2, 8, 4, 40
    )), .Names = c("ID", "Reason.for.absence", "Month.of.absence", 
"Day.of.the.week", "Seasons", "Transportation.expense", "Distance.from.Residence.to.Work", 
"Service.time", "Age", "Work.load.Average.day", "Hit.target", 
"Disciplinary.failure", "Education", "Son", "Social.drinker", 
"Social.smoker", "Pet", "Weight", "Height", "Body.mass.index", 
"Absenteeism.time.in.hours"), row.names = c(NA, 8L), class = "data.frame")
Rohit Haritash
  • 404
  • 5
  • 20
  • 1
    try using the version of ggplot2 from github. That has support for quosures, but the one on cran did not (although this may have changed in the last 1-2 days - you can try updating ggplot2 from cran first) – Melissa Key Jun 26 '18 at 16:45
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Provide a sample `dataset` so we can actually run the code. But as Melissa pointed out, `ggplot` as of `2_2.2.1` doesn't like quosures. There is a new version that's been submitted to CRAN. For now,try `aes_(x = rlang::quo_smash(variable), ...)` – MrFlick Jun 26 '18 at 16:49
  • 2
    You also have a problem at `count(variable) %>%`, I think that should be `count(!!variable) %>%` since the print doesn't show a column named variable. – MrFlick Jun 26 '18 at 16:50
  • @MrFlick I have updated the question with dput. I tried using aes_(x = rlang::quo_smash(variable), ...) but still variable is unknown. Thx – Rohit Haritash Jun 26 '18 at 17:01
  • The error is not related to `ggplot`, you need to add `!!` to `variable` in your `count()` call as @MrFlick wrote. – VFreguglia Jun 26 '18 at 17:05
  • After adding dataset %>% count(!!variable) %>% ggplot(mapping = aes_(x = rlang::quo_smash(!!variable), y = n, fill = rlang::quo_smash(!!variable))) +... I am getting Error: 'quo_smash' is not an exported object from 'namespace:rlang' – Rohit Haritash Jun 26 '18 at 17:11
  • You must have an older version of rlang. Maybe `quo_expr` instead of smash. – MrFlick Jun 26 '18 at 17:14
  • Changed to {quo_expr} but still getting Error in !variable : invalid argument type. I am updating question with changed code. – Rohit Haritash Jun 26 '18 at 17:18

1 Answers1

1

Here are the parts that need to change

dataset %>%
    count(!!variable) %>%
    ggplot(mapping = aes_(x = rlang::quo_expr(variable), y = quote(n), fill = rlang::quo_expr(variable))) +
    ...

You need to escape the "variable" in count() and you don't use !! with quo_expr and you need to quote all parameters when using aes_().

This this code and the test data, this plot was made.

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295