1

I would like to make an adjustment to the code below. Note that I am averaging for Fridays and Thursdays. However, it is showing average for Thursday twice, this is because I have Thursday in capital letters and thursday in small letters in my database. However, I would like to make some adjustments that consider Thursday and thursday to be the same thing.

library(dplyr)

Test <- structure(list(date1 = as.Date(c("2021-11-01","2021-11-01","2021-11-01","2021-11-01")),
                       date2 = as.Date(c("2021-10-22","2021-10-22","2021-10-28","2021-10-30")),
                       Week = c("Friday", "Friday", "Thursday", "thursday"),
                       Category = c("FDE", "FDE", "FDE", "FDE"),
                       time = c(4, 6, 6, 3)), class = "data.frame",row.names = c(NA, -4L))
    meanTest1 <- Test %>%
      group_by(Week,Category) %>%
      dplyr::summarize(mean(time))

> meanTest1
  Week     Category `mean(time)`
1 Friday   FDE                 5
2 thursday FDE                 3
3 Thursday FDE                 6
Antonio
  • 1,091
  • 7
  • 24

1 Answers1

1

If we need it to be title case, use toTitleCase from tools, or else convert to lower case (tolower) or upper case (toupper) for the 'Week' column and use that in group_by

library(dplyr)
Test %>%
    group_by(Week = tools::toTitleCase(Week), Category) %>% 
    summarise(time = mean(time, na.rm = TRUE), .groups = 'drop')

-output

# A tibble: 2 × 3
  Week     Category  time
  <chr>    <chr>    <dbl>
1 Friday   FDE        5  
2 Thursday FDE        4.5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks @akrun. Just one question: There are any difference in `dplyr::summarize(mean(time))` to `summarise(time = mean(time, na.rm = TRUE), .groups = 'drop')`? – Antonio Nov 05 '21 at 16:07
  • @Antonio It is just to prevent the message in the output ``summarise()` has grouped output by 'Week'. You can override using the `.groups` argument.`. I always use that, instead of the default option which drops the last grouping element. The `na.rm = TRUE` is also useful when there are NA element in the column. By default, it is `na.rm = FALSE` and if there are any NA, then `mean` returns `NA` – akrun Nov 05 '21 at 16:08
  • 1
    Yes @akrun, thanks! =) – Antonio Nov 05 '21 at 16:52
  • about `na.rm=TRUE`, https://stackoverflow.com/questions/69857915/problem-using-na-rm-true-in-summarize-in-r-code – Antonio Nov 05 '21 at 18:48
  • @Antonio Your function is not clear, so I posted a solution with the actual function that does Mode – akrun Nov 05 '21 at 18:56