0

I've looked for similar threads but can't find a solution.

I've grouped the below dataset by carrier and created new variables to see average and sum delay times successfully. Now I simply want to arrange the data by avg delay, but when I put the below code in it's returning the same data for every row. Can anyone help me figure out where I went wrong?

Using dplyr package, dataset is "flights", have filtered out the na values using:

filter(!is.na(dep_delay), !is.na(arr_delay)). 

I got the data and excercise from section 5.6.7 of this resource http://r4ds.had.co.nz/transform.html#exercises-11

bycarrier %>%  
  transmute(  
    arrsum = sum(arr_delay),  
    arravg = mean(arr_delay),  
    depsum = sum(dep_delay),  
    depavg = mean(dep_delay)   
  ) %>%  
  arrange(desc(arravg))

Returns:

Adding missing grouping variables: `carrier`
Source: local data frame [327,346 x 5]
Groups: carrier [16]

   carrier arrsum  arravg depsum   depavg  
     <chr>  <dbl>   <dbl>  <dbl>    <dbl>  
1       F9  14928 21.9207  13757 20.20117  
2       F9  14928 21.9207  13757 20.20117  
3       F9  14928 21.9207  13757 20.20117  
4       F9  14928 21.9207  13757 20.20117  
5       F9  14928 21.9207  13757 20.20117  
6       F9  14928 21.9207  13757 20.20117  
7       F9  14928 21.9207  13757 20.20117  
8       F9  14928 21.9207  13757 20.20117  
9       F9  14928 21.9207  13757 20.20117  
10      F9  14928 21.9207  13757 20.20117  
# ... with 327,336 more rows  
Cyrus Mohammadian
  • 4,982
  • 6
  • 33
  • 62
  • check order() or sort() – Dinesh.hmn Sep 09 '16 at 13:12
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great R example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – lmo Sep 09 '16 at 13:13
  • order() and sort() return the following error: Error in sort(., arravg) : object 'arravg' not found – originalgriefster Sep 09 '16 at 14:01

1 Answers1

2

I think you need to use the function summarise instead of transmute as follows :

bycarrier %>%  
  summarise(  
    arrsum = sum(arr_delay),  
    arravg = mean(arr_delay),  
    depsum = sum(dep_delay),  
    depavg = mean(dep_delay)   
  ) %>%  
  arrange(desc(arravg))

That will give the output :

# A tibble: 16 x 5
   carrier arrsum     arravg  depsum    depavg
     <chr>  <dbl>      <dbl>   <dbl>     <dbl>
1       F9  14928 21.9207048   13757 20.201175
2       FL  63868 20.1159055   59074 18.605984
3       EV 807324 15.7964311 1013928 19.838929
4       YV   8463 15.5569853   10281 18.898897
5       OO    346 11.9310345     365 12.586207
6       MQ 269767 10.7747334  261521 10.445381
7       WN 116214  9.6491199  212717 17.661657
8       B6 511194  9.4579733  700883 12.967548
9       9E 127624  7.3796692  284306 16.439574
10      UA 205589  3.5580111  694361 12.016908
11      US  42232  2.1295951   74261  3.744693
12      VX   9027  1.7644644   65263 12.756646
13      DL  78366  1.6443409  439595  9.223950
14      AA  11638  0.3642909  273758  8.569130
15      HA  -2365 -6.9152047    1676  4.900585
16      AS  -7041 -9.9308886    4134  5.830748
sjakw
  • 461
  • 4
  • 10
  • Yup, that worked. Thank you so much. I'm still a little unsure why it doesn't work with transmute, but I'm still learning so I'm sure I'll figure it out. – originalgriefster Sep 09 '16 at 18:34