0

I have a local data frame that I'm trying to group by 2 variables ("yr" and "mo"), get the mean of the data in each group and sort the results so the most recent data appears at the top in descending order. However, I can't figure out how to get the "yr" variable to sort in descending order. The "yr" variable is erroneously displayed in ascending order.

library(dplyr)
df <- tbl_df(data.frame(yr = c(2009, 2009, 2009, 2010, 2010, 2010, 2011, 2011, 2011), 
                    qtr = c(1, 1, 1, 1, 1, 2, 2, 2, 2),
                    mo = c(1, 1, 2, 3, 3, 4, 5, 5, 5), 
                    date = as.Date(c("2009-01-01", "2009-01-02","2009-02-01",
                                     "2010-03-01","2010-03-02","2010-04-01",
                                     "2011-05-01","2011-05-02","2011-05-03")),
                    x = c(10, 20, 30, 40, 50, 60, 70, 80, 90),
                    y = c(2, 4, 6, 8, 10, 12, 14, 16, 18),
                    z = c(1, 3, 5, 7, 9, 11, 13, 15, 17)))
df %>%
    select(yr, mo, x:z) %>%
    group_by(yr, mo) %>%
    summarize_each(funs(mean)) %>%
    arrange(desc(yr), desc(mo))

Source: local data frame [5 x 5]
Groups: yr [3]

 yr    mo     x     y     z
(dbl) (dbl) (dbl) (dbl) (dbl)
1  2009     2    30     6     5
2  2009     1    15     3     2
3  2010     4    60    12    11
4  2010     3    45     9     8
5  2011     5    80    16    15

If I remove "desc(yr)" and just use "yr" in the arrange() function, I get the same results.

df %>%
      select(yr, mo, x:z) %>%
      group_by(yr, mo) %>%
      summarize_each(funs(mean)) %>%
      arrange(yr, desc(mo))

Source: local data frame [5 x 5]
Groups: yr [3]

 yr    mo     x     y     z
(dbl) (dbl) (dbl) (dbl) (dbl)
1  2009     2    30     6     5
2  2009     1    15     3     2
3  2010     4    60    12    11
4  2010     3    45     9     8
5  2011     5    80    16    15

If I remove the "desc(mo)" and just use "mo" in the arrange function, I get the expected results and the data is sorted on "mo" in ascending order.

df %>%
      select(yr, mo, x:z) %>%
      group_by(yr, mo) %>%
      summarize_each(funs(mean)) %>%
      arrange(yr, mo)

Source: local data frame [5 x 5]
Groups: yr [3]

 yr    mo     x     y     z
(dbl) (dbl) (dbl) (dbl) (dbl)
1  2009     1    15     3     2
2  2009     2    30     6     5
3  2010     3    45     9     8
4  2010     4    60    12    11
5  2011     5    80    16    15

How come the "yr" variable won't respond to the desc() fuction but the "mo" variable will? How do I get the results to be sorted by "yr" in descending order and then "mo" in descending order? Thanks!

grove80904
  • 419
  • 2
  • 5
  • 14

1 Answers1

2

It looks like the grouping is interfering with the arranging.

Try adding an ungroup():

df %>%
  select(yr, mo, x:z) %>%
  group_by(yr, mo) %>%
  summarise_each(funs(mean)) %>%
  ungroup() %>%
  arrange(desc(yr), desc(mo))

Should give you

    yr mo  x  y  z
1 2011  5 80 16 15
2 2010  4 60 12 11
3 2010  3 45  9  8
4 2009  2 30  6  5
5 2009  1 15  3  2

which I think is what you want: both yr and mo descending.

tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
  • Thanks! This works. However, I'm confused by the need to ungroup. According to the comments in [http://stackoverflow.com/questions/21736956/can-i-switch-the-grouping-variable-in-a-single-dplyr-statement](http://stackoverflow.com/questions/21736956/can-i-switch-the-grouping-variable-in-a-single-dplyr-statement) , ungrouping is no longer needed after a `mutate` / `summarize` operation with the new dplyr package. Is ungrouping needed in the current case because I'm using `summarize_each` instead of `summarize` ? – grove80904 Nov 11 '15 at 13:54
  • 1
    Figured it out while watching this dplyr tutorial (around minute 22 of the video) [https://www.youtube.com/watch?v=2mh1PqfsXVI](https://www.youtube.com/watch?v=2mh1PqfsXVI). When grouping by 2 or more variables, only the last variable is removed after a `mutate`/`summarize` command. Therefore the `ungroup` command is needed. If I had only grouped by 1 variable, the `ungroup` command wouldn't be necessary. – grove80904 Nov 12 '15 at 19:33
  • Ah! Great to find the official reason! Thanks for the follow up. – tumultous_rooster Nov 12 '15 at 19:49