1

I have a variable which is the concatenation of month and year, in a numeric format. The month is in format 1-12, not 01-12.

My variable looks like:

mmyyyy
12014
22014
102014
52015
112015

I am looking for a regexp to match the month or the year only:

for year, I did something like:

year <- ifelse(grepl("2014", mmyyyy), 2014, ifelse(grepl("2015", mmyyyy), 2015, 2016))

But for the month, I am struggling. My first thought is to replace 2014, 2015, etc. by blank then to convert the result in numeric.

month <- as.numeric(gsub("[[^2014]]", "", mmyyyy))

but here, I can't find a suitable regexp expression.

In the end, I would like a variable/ vector with the numeric year(yyyy) and a variable/vector with the numeric month.

YCR
  • 3,794
  • 3
  • 25
  • 29

7 Answers7

7

A possible solution using tidyr Which will create both month and year columns simultaneously in one call.

library(tidyr)
extract(df, mmyyyy, c("month", "year"), "(\\d+)(\\d{4})", convert = TRUE)
#   month year
# 1     1 2014
# 2     2 2014
# 3    10 2014
# 4     5 2015
# 5    11 2015

Data

df <- data.frame(mmyyyy = c(12014,
                            22014,
                            102014,
                            52015,
                            112015))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
6

One option is

# for the months:
> as.numeric(gsub("(.*)[0-9]{4}$", "\\1", x))
#[1]  1  2 10  5 11
# for the years:
> as.numeric(gsub(".*([0-9]{4})$", "\\1", x))
#[1] 2014 2014 2014 2015 2015

This works for any 4-digit years.

talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    @PalashKumar, `[0-9]{4}$` represents the last 4 numeric digits and `.*` means any characters any number of times. Then, `(...)` is a capture group that you can extreact. In both cases I extract the first capture group but change the place of the `(...)`. – talat Dec 30 '15 at 11:36
4
mmyyyy <- c(12014,22014,102014, 52015, 112015)

Making a 6 digit vector using sprintf

dates <- sprintf("%06d", mmyyyy)

You could use the yearmon function from the zoo package

library(zoo)
dates1 <- as.yearmon(dates, format = "%m%Y")
format(dates1, "%m")
# [1] "01" "02" "10" "05" "11"
format(dates1, "%Y")
# [1] "2014" "2014" "2014" "2015" "2015"

EDIT: Updated as per @David's comments

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You could use the package unglue :

df <- data.frame(mmyyyy = c(12014, 22014, 102014, 52015, 112015))
library(unglue)
unglue_unnest(df, mmyyyy, "{month}{year=\\d{4}}", convert = TRUE)
#>   month year
#> 1     1 2014
#> 2     2 2014
#> 3    10 2014
#> 4     5 2015
#> 5    11 2015
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

How about something like below (assuming you are only dealing with years > 2000)

month <- as.numeric(gsub("20[0-9]+", "", mmyy))
Sam Gilbert
  • 1,642
  • 3
  • 21
  • 38
0

I dont really know how to do REGEX - but heres a simple code. This code will work for all the years until 9999 :)

dmmyyyy<-c("12014","22014","102014","52015","112015")
dmmyyyy<-as.character(dmmyyyy)
month <- substr(dmmyyyy, nchar(dmmyyyy)-4+1, nchar(dmmyyyy))
month
[1] "2014" "2014" "2014" "2015" "2015"
CuriousBeing
  • 1,592
  • 14
  • 34
-1

Extracting the last n characters from a string in R

Why not to split last characters as year? see str_sub from stringr package.

Community
  • 1
  • 1
Palash Kumar
  • 429
  • 6
  • 18