1

I have a data set with a character column for months (MONTH) and a numeric column indicating years (YEAR). In order to work with it as panel data, I need to unite these YEAR and MONTH into a variable with a date format.

I have tried to change the variable MONTH to numeric format and then to merge MONTH with the column YEAR. R would not recognize it as a date variable.

It currently looks like this:

  STATE          MONTH     YEAR     VALUE
California        JAN      2018      800
California        FEB      2018      780
California        MAR      2018      600
    ...           ...       ...      ...
Minesota          JAN      2018      800
Minesota          FEB      2018      780
Minesota          MAR      2018      600
    ...           ...       ...      ...

I want it like this:

  STATE          TIME        VALUE
California     01-2018        800
California     02-2018        780
California     03-2018        600
    ...           ...         ...
Minesota       01-2018        800
Minesota       02-2018        780
Minesota       03-2018        600
    ...           ...         ...
Balthasar
  • 197
  • 2
  • 14
  • A date requires day, month and year. So you either add the 1st of each month as an arbitrary day before date formatting, or use `zoo::as.yearmon` to convert to "year-month" objects. See the answer by @tim-biegeleisen for the first approach. – neilfws Jan 24 '19 at 21:23

5 Answers5

3

I would recommend handling this by going through bona-fide R dates, using as.Date to generate an R date, then using format to render the string output you want. Something like this:

df$TIME <- format(as.Date(paste0(df$MONTH, df$YEAR, "01"), format="%b%Y%d"), "%m-%Y")

I arbitrarily assign the first to each date in your data set, but this doesn't matter, because the call to format only includes the month and year.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

In base R you could do something like:

transform(df,TIME = paste(sprintf('%02d',match(MONTH,toupper(month.abb))),YEAR,sep = '-'))[c(1,5,4)]
       STATE    TIME VALUE
1 California 01-2018   800
2 California 02-2018   780
3 California 03-2018   600
4        ...  NA-...   ...
5   Minesota 01-2018   800
6   Minesota 02-2018   780
7   Minesota 03-2018   600
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Combining Tim's response with an easy to use date package lubridate we get:

# This can handle months of JAN, FEB, ETC. Or it can handle months of 01,02,etc.
df$TIME <- lubridate::ymd(paste0(df$YEAR,df$MONTH,"01")) 

# or if you need it in MM-YYYY format:
df$TIME <- format(lubridate::ymd(paste0(df$YEAR,df$MONTH,"01")), "%m-%Y")
Adam Sampson
  • 1,971
  • 1
  • 7
  • 15
0

You could simplify the below, but it makes it easier to see what's going on:

library(lubridate)
library(tidyverse)

df2 <- df %>% 
  mutate(TIME = parse_date_time(paste0(MONTH, YEAR), orders = "%b%Y"),
         TIME = as.character(substr(TIME, 6, 7)),
         TIME = paste0(TIME, "-", YEAR))

This is using lubridate - the easiest way to parse dates in R IMO, dplyr from tidyverse and substr from base R.

If you want to keep the date column then just pipe in another mutate and call the new column something different.

nycrefugee
  • 1,629
  • 1
  • 10
  • 23
0

If you wish to use a full-on Tidyverse solution, consider this combination of tidyr, and lubridate's parse_date_time:

library(tidyverse)
df <- tibble::tribble(
  ~STATE,      ~MONTH,      ~YEAR,   ~VALUE,
"California",     "JAN",      2018,      800,
"California",     "FEB",      2018,      780,
"California",     "MAR",      2018,      600,
"Minesota",       "JAN",      2018,      800,
"Minesota",       "FEB",      2018,      780,
"Minesota",       "MAR",      2018,      600)

df %>%
   tidyr::unite(TIME, c(MONTH, YEAR), sep = "-") %>%
   dplyr::mutate(TIME = lubridate::parse_date_time(TIME, "my"))
#> # A tibble: 6 x 3
#>   STATE      TIME                VALUE
#>   <chr>      <dttm>              <dbl>
#> 1 California 2018-01-01 00:00:00   800
#> 2 California 2018-02-01 00:00:00   780
#> 3 California 2018-03-01 00:00:00   600
#> 4 Minesota   2018-01-01 00:00:00   800
#> 5 Minesota   2018-02-01 00:00:00   780
#> 6 Minesota   2018-03-01 00:00:00   600

Also check out the following related question: Converting year and month ("yyyy-mm" format) to a date?

Balthasar
  • 197
  • 2
  • 14