2

I try the code from here Mean by factor by level but it does not work. Here is my situation. From the flights dataset, I want to know the average flight delay of all planes from the carrier UA.

library(nycflights13)
data(flights)
mean(flights$air_time[flights$carrier == "UA"])

But what results is just

[1] NA

What did I do wrong?

Ric S
  • 9,073
  • 3
  • 25
  • 51
duniss
  • 35
  • 3
  • Seems you did nothing wrong, but you can't take the mean if it includes `NA`: `mean(c(1,2,3,NA))` – s_baldur Jun 19 '20 at 09:49
  • 1
    Why are you you extracting `air_time` ? Delay is given by `arr_delay` or `dep_delay` for arrival and departure delay respectively. So you can do `mean(flights$arr_delay[flights$carrier == "UA"], na.rm = TRUE)` Or `mean(flights$dep_delay[flights$carrier == "UA"], na.rm = TRUE)` – Ronak Shah Jun 19 '20 at 09:50
  • @Ronak Shah you are right, thank you. – duniss Jun 19 '20 at 10:16

3 Answers3

2

Since there are missing values (NA) in the dataset, you need to specify the argument na.rm = TRUE within the mean function. Otherwise, if at least one value is NA, the mean function (as well as other functions like sum, min, max, ...) will return NA.

mean(flights$air_time[flights$carrier == "UA"], na.rm = TRUE)
# [1] 211.7914
Ric S
  • 9,073
  • 3
  • 25
  • 51
1

If you are looking for the mean time for flights$carrier == "UA", you could try a solution in dplyr by using summarise

This solution takes missing values into account by na.rm=TRUE

library(dplyr)
flights %>% 
  filter(carrier == "UA") %>%
  summarise(., mean(air_time, na.rm=TRUE))
cmirian
  • 2,572
  • 3
  • 19
  • 59
1

As Ric S says, use na.rm = TRUE and keep in mind that when R finds NA values, functions will use it as the main value, so you might have similar problems using many other similar functions such as median, max, min, etc.