0

I'm trying to find the maximum number of flights delayed from certain origins using the library(nycflights13) and I'm not able to figure out how to group by "chr"

library(nycflights13)
library(dplyr)
flights2 <- mutate(flights,factori = as.factor(flights$origin))
flights2 %>% 
filter(dep_delay > 2) %>% 
select(dep_delay, factori) %>% 
group_by(factori)

Sample of output:

enter image description here

How can I get them grouped together? How can I find the max count?

  • What do you mean by `the factor origin isn't grouping together`. The output of your code is grouped by `factori`. – Ronak Shah Feb 10 '21 at 06:18
  • My output still has them separated. So I have two columns dep_delay and factori but in the factori column, I still see several values with matching names that aren't grouped together. EG: rows 1-4 have LGA, EWR, JFK, JFK. I want all the JFKs together, all the EWRs together, and eventually I want to get a count of each. Would I use the summarise() function? I've included a sample of the output – snicksnackpaddywhack91 Feb 10 '21 at 06:23
  • Are you wanting the `tally()` of each? There are only three levels/origins but if you just wanting the frequency of each in the dataset add `%>% tally()` after your `group_by` – George Feb 10 '21 at 06:29

1 Answers1

3

group_by doesn't change anything in the structure of the data. The number of rows and column remain the same after group_by. It is what you do after group_by that decides the output.

To get max dep_delay for each factori you can do :

library(nycflights13)
library(dplyr)

flights2 %>% 
  filter(dep_delay > 2) %>% 
  select(dep_delay, factori) %>% 
  group_by(factori) %>%
  summarise(max = max(dep_delay, na.rm = TRUE))

#  factori   max
#* <fct>   <dbl>
#1 EWR      1126
#2 JFK      1301
#3 LGA       911

summarise usually gives only one row per group whereas mutate would keep the number of rows same as original data.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • TYVM! So follow up question, is it even necessary to change the "chr" to a factor in this situation? Could I have just done "group_by(origin)" and then the summarise? – snicksnackpaddywhack91 Feb 10 '21 at 06:39
  • It is not necessary to change character to factor. `group_by` works the same way on character variables as well and would give the same output. – Ronak Shah Feb 10 '21 at 06:40