-1

So I have a variable called date and I pulled out the month and year by using these 2 lines of code:

month_of_date <- month(as.POSIXlt(housing_data$date, format="%Y-%m-%d"))
year_of_date <- year(as.POSIXlt(housing_data$date, format="%Y-%m-%d"))

Then I combined it by using this line of code:

month_year_of_date <- paste(month_of_date, year_of_date, sep = "/")

How can I aggregate the data to a month/year level, and graph the month/year on the X Axis so that it is in order?

Here is the graph I have so far but it is not in order.

enter image description here

Code of graph:

ggplot(housing_data, aes(x = factor(month_year_of_date), y = housing_data$price)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  geom_line()
SRVFan
  • 334
  • 1
  • 6
  • 16
Andy
  • 303
  • 1
  • 9

3 Answers3

2

Assuming housing_data given in the Note at the end these two lines convert it to zoo with a yearmon index and then plot it using autoplot.zoo .

library(ggplot2)
library(zoo)

z <- read.zoo(housing_data, index = "date", FUN = as.yearmon)
autoplot(z, geom = "blank", width = .01) + geom_bar(stat = "identity") + scale_x_yearmon()

screenshot

Note

housing_data <- 
data.frame(price = 1:12, date = c("2000-01-01", 
"2000-02-01", "2000-03-01", "2000-04-01", "2000-05-01", "2000-06-01", 
"2000-07-01", "2000-08-01", "2000-09-01", "2000-10-01", "2000-11-01", 
"2000-12-01"))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Minor note: this is one of the reasons I really dislike month-first date representations. If you can stomach having year/month, year-month, or something similarly ordered, this would not be necessary ... but I digress.

The way to solve it has nothing to do with ggplot2, though it will benefit from this fix. Since you're already using factor, it's even easier. When you define the factors, you implicitly define the order.

Two methods:

  1. Using the data provided, with no extra levels.

    set.seed(2)
    random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
    month_of_date <- lubridate::month(random_dates)
    year_of_date <- lubridate::year(random_dates)
    month_year_of_date <- paste(month_of_date, year_of_date, sep = "/")
    month_year_of_date
    #  [1] "11/2018" "4/2020"  "11/2019" "10/2018" "11/2020" "11/2020" "9/2018" 
    #  [8] "8/2020"  "8/2019"  "10/2019" "10/2019" "12/2018" "5/2020"  "10/2018"
    # [15] "6/2019"  "8/2020"  "12/2020" "12/2018" "7/2019"  "7/2018" 
    

    the are out of order, so we use order, by the year and month variables:

    ordered_month_year_of_date <- unique(month_year_of_date[ order(year_of_date, month_of_date) ])
    ordered_month_year_of_date
    #  [1] "7/2018"  "9/2018"  "10/2018" "11/2018" "12/2018" "6/2019"  "7/2019" 
    #  [8] "8/2019"  "10/2019" "11/2019" "4/2020"  "5/2020"  "8/2020"  "11/2020"
    # [15] "12/2020"
    

    now define the factor

    month_year_of_date <- factor(month_year_of_date, levels = ordered_month_year_of_date)
    
  2. Define a full-length set of possible months; this will be bigger, but if you expect to expand the dataset at some point, then all points in between will already be covered.

    set.seed(2)
    random_dates <- as.Date(Sys.Date() + sample(1000, size=20))
    month_of_date <- lubridate::month(random_dates)
    year_of_date <- lubridate::year(random_dates)
    ordered_date_range <- format(do.call(seq, c(as.list(range(random_dates)), by="month")),
                                 format = "%m/%Y")
    head(ordered_date_range)
    # [1] "07/2018" "08/2018" "09/2018" "10/2018" "11/2018" "12/2018"
    

    the leading-zero will flumox factor, so we'll remove it:

    ordered_date_range <- gsub("^0", "", ordered_date_range)
    head(ordered_date_range)
    # [1] "7/2018"  "8/2018"  "9/2018"  "10/2018" "11/2018" "12/2018"
    month_year_of_date <- factor(paste(month_of_date, year_of_date, sep = "/"),
                                 levels = ordered_date_range)
    

From here, sorting "just works":

month_year_of_date
#  [1] 11/2018 4/2020  11/2019 10/2018 11/2020 11/2020 9/2018  8/2020  8/2019 
# [10] 10/2019 10/2019 12/2018 5/2020  10/2018 6/2019  8/2020  12/2020 12/2018
# [19] 7/2019  7/2018 
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020
sort(month_year_of_date)
#  [1] 7/2018  9/2018  10/2018 10/2018 11/2018 12/2018 12/2018 6/2019  7/2019 
# [10] 8/2019  10/2019 10/2019 11/2019 4/2020  5/2020  8/2020  8/2020  11/2020
# [19] 11/2020 12/2020
# 30 Levels: 7/2018 8/2018 9/2018 10/2018 11/2018 12/2018 1/2019 ... 12/2020

which will make your (completely untested) plotting code something like:

ggplot(housing_data, aes(x = month_year_of_date, y = housing_data$price)) +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  geom_line()

(i.e., no factor, since it's already been done).

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

The month will remove the leading zeros from the month value. For instance, the month "03" is output as "3". To get the output as "03", try to get the month and year as follows.

year_of_date <- format(as.POSIXlt(housing_data$date, format="%Y-%m-%d"),"%Y")
month_of_date <- format(as.POSIXlt(housing_data$date, format="%Y-%m-%d"),"%m")

month_year_of_date <- paste(year_of_date, month_of_date,  sep = "/")
Naveen
  • 1,190
  • 7
  • 20