Minor note: this is one of the reasons I really dislike month-first date representations. If you can stomach having year/month, year-month, or something similarly ordered, this would not be necessary ... but I digress.
The way to solve it has nothing to do with ggplot2
, though it will benefit from this fix. Since you're already using factor
, it's even easier. When you define the factors, you implicitly define the order.
Two methods:
Using the data provided, with no extra levels.
set.seed(2)
random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
month_of_date <- lubridate::month(random_dates)
year_of_date <- lubridate::year(random_dates)
month_year_of_date <- paste(month_of_date, year_of_date, sep = "/")
month_year_of_date
# [1] "11/2018" "4/2020" "11/2019" "10/2018" "11/2020" "11/2020" "9/2018"
# [8] "8/2020" "8/2019" "10/2019" "10/2019" "12/2018" "5/2020" "10/2018"
# [15] "6/2019" "8/2020" "12/2020" "12/2018" "7/2019" "7/2018"
the are out of order, so we use order
, by the year and month variables:
ordered_month_year_of_date <- unique(month_year_of_date[ order(year_of_date, month_of_date) ])
ordered_month_year_of_date
# [1] "7/2018" "9/2018" "10/2018" "11/2018" "12/2018" "6/2019" "7/2019"
# [8] "8/2019" "10/2019" "11/2019" "4/2020" "5/2020" "8/2020" "11/2020"
# [15] "12/2020"
now define the factor
month_year_of_date <- factor(month_year_of_date, levels = ordered_month_year_of_date)
Define a full-length set of possible months; this will be bigger, but if you expect to expand the dataset at some point, then all points in between will already be covered.
set.seed(2)
random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
month_of_date <- lubridate::month(random_dates)
year_of_date <- lubridate::year(random_dates)
ordered_date_range <- format(do.call(seq, c(as.list(range(random_dates)), by="month")),
format = "%m/%Y")
head(ordered_date_range)
# [1] "07/2018" "08/2018" "09/2018" "10/2018" "11/2018" "12/2018"
the leading-zero will flumox factor
, so we'll remove it:
ordered_date_range <- gsub("^0", "", ordered_date_range)
head(ordered_date_range)
# [1] "7/2018" "8/2018" "9/2018" "10/2018" "11/2018" "12/2018"
month_year_of_date <- factor(paste(month_of_date, year_of_date, sep = "/"),
levels = ordered_date_range)
From here, sorting "just works":
month_year_of_date
# [1] 11/2018 4/2020 11/2019 10/2018 11/2020 11/2020 9/2018 8/2020 8/2019
# [10] 10/2019 10/2019 12/2018 5/2020 10/2018 6/2019 8/2020 12/2020 12/2018
# [19] 7/2019 7/2018
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020
sort(month_year_of_date)
# [1] 7/2018 9/2018 10/2018 10/2018 11/2018 12/2018 12/2018 6/2019 7/2019
# [10] 8/2019 10/2019 10/2019 11/2019 4/2020 5/2020 8/2020 8/2020 11/2020
# [19] 11/2020 12/2020
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020
which will make your (completely untested) plotting code something like:
ggplot(housing_data, aes(x = month_year_of_date, y = housing_data$price)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
geom_line()
(i.e., no factor
, since it's already been done).