0

I have a data frame DF which has a column Month as a character string using the full English name of the month, and a column Year as numeric:

Year Month {several xi}        
2016 April {numeric} 

I need to plot several of the xi as a time series. What is the most efficient way to sort this data frame from the earliest month (January 2015) to the present? My attempts to convert "month" into a date-classed object using as.Date are not working as I'd like; they keep coming back sorted alphabetically.

Apologies if this is a noob question, but by sheer bad luck I have not had to work with date-class objects very often in my R career, so I'm not sure which of the various similar questions I am seeing can help me.

mmyoung77
  • 1,343
  • 3
  • 14
  • 22
  • 3
    `month.name` is a built-in constant with the months in the correct order. Simply do `df$Month = factor(df$Month, levels = month.name)` to create a factor with the proper ordering. You can then `df[order(df$Year, df$Month), ]`. – Gregor Thomas May 23 '16 at 21:54
  • 1
    You won't be able to make it a `Date` without a day - you could use `1` for the day. You could use the `yearmon` class of the `zoo` package, almost exactly [as in this question](http://stackoverflow.com/q/6242955/903061), but you'll need to use `%B` instead of `%m` since you have unabbreviated month names. See `?strptime` for other datepart wildcards. – Gregor Thomas May 23 '16 at 21:59

1 Answers1

0

I concur with Gregor's suggestion of using the zoo package. I think it is good practice to combine dates into one variable. If you ever need to extract information about only the year or month you can use the lubridate package. Here is a simple example of how to use zoo.

library(zoo)

#Toy Data Set
d <- data.frame( Month = c("March", "April", "May", "March"), Year = c("2008", "1998", "1997", "1999"), stringsAsFactors = FALSE)

#Generating Yearmon
d$my <- as.yearmon(paste(d$Month, d$Year)) 

#Ordering the data
d <- d[order(d$my), ]

Make sure that the month and year variables in your data frame are not factors. They must respectively be of a character and numeric/integer class.

One note, if you plan to use ggplot instead of plot then you'll need to use scale_x_yearmon().

Finally, you mention that you had trouble with as.Date. As Gregor notes, this is because as.Date expects a format which contains a day, month and year. Therefore in your case you can insert an arbitrary day to use as.Date. For example, as.Date(paste(d$Month, 1, d$Year), "%B %d %Y"). For a complete list of the different date formats read this.

Jacob H
  • 4,317
  • 2
  • 32
  • 39
  • Thanks to all responders. Both ordering by `month.name` and using `as.Date(paste(d$Month, 1, d$Year), "%B %d %Y")` worked like a charm. The file is quite big and I'm importing a fair number of packages already, so I avoided using `zoo`, but I'll keep it in mind. – mmyoung77 May 24 '16 at 14:55