0

I'm using the nycflights dataset from the nycflights123 library. I noticed something strange. When I tried to arrange these to find the worst on-time record of flights by tail number:

 #Which plane has the worst on-time record?
    worst_delay %>% 
      group_by(tailnum) %>% 
      select(tailnum,arr_delay,dep_delay) %>% 
      mutate(sum_delay=sum(arr_delay,dep_delay)) %>% 
      arrange(desc(sum_delay))
    View(worst_delay)

I got the following:

Console

However when I run the same command and use View(worst_delay), I get this:

Dataframe

Is there something I'm missing?

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
  • Hi. Please provide a reproducible example. Pictures are not reproducible. Refer to https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to find out how to do that – Julian_Hn Aug 16 '21 at 10:54
  • I said its from the nycflights dataset. I'll edit in the code for loading that library if its helpful. – Shawn Hemelstrand Aug 16 '21 at 10:57
  • Which I would need to download and install ... It's good practice to provide as much information as you can if you expect us to provide help ... – Julian_Hn Aug 16 '21 at 10:58
  • Idk thats just what the book says. – Shawn Hemelstrand Aug 16 '21 at 11:00
  • 2
    You haven't assigned your result to any object. You're just viewing the original `worst_delay` dataset. If you want to `View()` your calculations, you could just add a `%>% View()` to the end of the code. – caldwellst Aug 16 '21 at 11:01
  • I just noticed that a moment ago. Edited it into my answer. – Shawn Hemelstrand Aug 16 '21 at 11:03

1 Answers1

0

I figured it out. I just forgot to code in the dataset (was "worst delay %/%", shoulda been "worst delay <- flights %/%):

library(nycflights123)
data(flights)

worst_delay <- flights %>% 
      group_by(tailnum) %>% 
      select(tailnum,arr_delay,dep_delay) %>% 
      mutate(sum_delay=sum(arr_delay,dep_delay)) %>% 
      arrange(desc(sum_delay))
    View(worst_delay)
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
  • I just see a link to this post. – Shawn Hemelstrand Aug 16 '21 at 11:04
  • 1
    I'm not sure if this would lead to your desired output as the worst on-time record refers to the plane that had the most number of delays in the data set. So you are dealing with count here. – Anoushiravan R Aug 16 '21 at 11:07
  • What is a better alternative then? – Shawn Hemelstrand Aug 16 '21 at 11:18
  • 1
    `flights %>% filter(!is.na(tailnum)) %>% mutate(on_time = !is.na(arr_time) & (arr_delay <= 0)) %>% group_by(tailnum) %>% summarise(on_time = mean(on_time), n = n()) %>% filter(min_rank(on_time) == 1)` This is the answer given in this book for the exact same question. – Anoushiravan R Aug 16 '21 at 11:19
  • My book doesn't seem to have an answer section. Tried looking at the end of the chapters and the end of the book. Not sure if I just have a bad copy, but that would be tremendously helpful to have. – Shawn Hemelstrand Aug 16 '21 at 11:23
  • 1
    No this is another book which contains solution for R for data science, which is indeed extremely helpful and educative. Use it in conjunction with the book. – Anoushiravan R Aug 16 '21 at 11:26
  • What's the name if you dont mind me asking? – Shawn Hemelstrand Aug 16 '21 at 11:26
  • 1
    I am so sorry, I didn't notice I posted a link to this post instead. Here it is: https://jrnold.github.io/r4ds-exercise-solutions/transform.html – Anoushiravan R Aug 16 '21 at 11:27
  • 1
    Thanks a bunch! I end up scratching my head so often cuz this stuff is so unintuitive to me. Appreciate it. – Shawn Hemelstrand Aug 16 '21 at 11:29
  • 1
    You're welcome. These two books are fantastic just take your time with them and don't rush to finish them as soon as possible because they cover a lot of grounds and the exercises have numerous great tips. – Anoushiravan R Aug 16 '21 at 11:31