0

I have a list of dates, and I need to report them by month and year (Mar 2020, Apr 2020, etc). However, when I parse the Month and Year from the date, I get a character string instead of a date, so when I try to plot it into ggplot, the order is alphabetical instead of chronological.

I know I can manually specify an order with factor, but typing out every month and year combination will be painful--is there a more efficient way to tackle this problem? I tried wrapping my date in my() from lubridate, but that didn't work.

#My sample data
library(dplyr)

test <- tibble(date = seq(ymd('2021-01-01'), ymd('2021-12-31'), by = "1 day"),
                              values = c(1001:1182, 800:900, 1:82),
                              month = cut.Date(date, breaks = "1 month", labels = FALSE)) %>%
  group_by(month) %>%
  mutate(month = format(last(date), '%b %Y')) %>%
  ungroup()

Here's a simple plot showing that the order is alphabetical instead of chronological

#Simple plot showing that the order is alphabetical instead of chronological

library(ggplot2)
ggplot(test, aes(x = month, y = values)) +
  geom_col()

enter image description here

J.Sabree
  • 2,280
  • 19
  • 48
  • See the dupe links for more explanation and details. Long-story-short: every question (on SO and elsewhere) asking to adjust the order of axis labels in `ggplot2` is resolved using `factor`s; this is the canonical approach, its design is intentional, and at some point you might agree that its behavior is logical and reasonable (given use-cases outside of your immediate needs). – r2evans Mar 16 '22 at 14:13
  • @r2evans, I appreciate the link to the other question, but I think this question is distinct because it involves a specific use case (dates). The answer below, which works for this use case, would only work for dates and not other character strings. Can you remove the duplication tag? – J.Sabree Mar 16 '22 at 14:26
  • I don't see a distinction. The issue is resolved by changing `month` to a `factor` (if only internally). What makes you think that using `factor`s in your accepted answer is different from the dupe-links explaining how to use `factor`s to order the axis? Regardless, having it marked as a dupe doesn't reflect poorly on you, what's the rationale? – r2evans Mar 16 '22 at 14:35
  • @r2evans, the answer below uses the my() function, which is a specific use case for when people have month year data. This solution is much more elegant than manually specifying each level by using factor(). Had I just read the original posts, I wouldn't have known about the ability to do this within ggplot. I'm okay either way, but I just would hate for someone to look at the question with the same issue, see it as dupe and go to the other link, just for their takeaway to be that they have to manually specify the levels. – J.Sabree Mar 16 '22 at 15:09
  • 1
    Okay, J.Sabree, I've un-duped the question. Your point about the use of `lubridate::my` is relevant, which means the issue that is "novel" is how to order `"%b %Y"` strings and is distinct from `ggplot2`-based problems. Perhaps I inferred too much into the `ggplot2` connection (as it is good for depiction but a red-herring for the underyling problem). – r2evans Mar 16 '22 at 15:47

1 Answers1

2

The reorder function (stats package) can be used to sort factor levels. Here you can use my in the second argument to determine the sort order. So I believe this does what you need:

ggplot(test, aes(x = reorder(month, my(month)), y = values)) + geom_col()
sashahafner
  • 435
  • 1
  • 7