2

I have a data frame of a few columns, the last one is called a Filename. This is how it looks like.

    Product Company Filename
       …        …       mg-tvd_bmmh_20170930.csv
       …        …       mg-tvd_bmmh_2016_06_13.csv
       …        …       …

I am trying to write a short script in R which takes dates from a filename and transforms it into a new column which I call a Date. So a new data frame would look like this:

     Product    Company   Date          Filename
       …          …       09/30/2017    mg-tvd_bmmh_20170930.csv
       …          …       16/13/2017    mg-tvd_bmmh_2016_06_13.csv
       …          …        …                …   

This is a relevant piece of my script.

   df <- mutate(df, Date <- grep(pattern = "(\d{4})_?(\d{2})_?
   (\d{1,2})", df$Filename, value = TRUE))
   ddf$Date <- as.Date(Date,format = "%m/%d/%y")

Any advice why I can't get it working?

I am getting these errors:

 Error: '\d' is an unrecognized escape in character string starting ""(\d"
 Error in as.Date(Date, format = "%m/%d/%y") :
   object 'Date' not found
oguz ismail
  • 1
  • 16
  • 47
  • 69
Lina Linutina
  • 363
  • 1
  • 2
  • 17
  • where is it erroring? it looks like your date format line should be ddf$Date <- as.Date(Date,format = "%Y%m%d") – Preston Oct 17 '17 at 12:15
  • > Error: '\d' is an unrecognized escape in character string starting ""(\d" > Error in as.Date(Date, format = "%m/%d/%y") : object 'Date' not found – Lina Linutina Oct 17 '17 at 15:04

3 Answers3

3

You can use this command:

transform(df, Date = as.Date(sub(".*\\D(\\d{4})_?(\\d{2})_?(\\d{1,2}).*",
                                 "\\1\\2\\3", Filename), "%Y%m%d"))
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

You are getting the error because instead of:

ddf$Date <- as.Date(Date,format = "%m/%d/%y")

you should have:

df$Date <- as.Date(df$Date,format = "%Y/%m/%d")

or:

df %>%
  mutate(Date = as.Date(df$Date,format = "%Y/%m/%d"))

The incorrect specification of format = "%m/%d/%y" would give you NA values in Date while the incorrect reference of as.Date(Date, ... would throw you the error.

You can also use str_extract from stringr to extract the dates and ymd from lubridate to parse it to Date object:

library(dplyr)
library(stringr)
library(lubridate)

df %>%
  mutate(Date = ymd(str_extract(Filename, "\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")))

Data:

  Product Company                   Filename       Date
1       1       3   mg-tvd_bmmh_20170930.csv 2017-09-30
2       2       4 mg-tvd_bmmh_2016_06_13.csv 2016-06-13

The advantage with ymd is that it "...recognize arbitrary non-digit separators as well as no separator..." So there is no need to standardize the Date character vector before parsing. For instance,

> df$Filename %>% str_extract("\\d{4}_?\\d{2}_?\\d{2}(?=\\.csv)")
[1] "20170930"   "2016_06_13"
acylam
  • 18,231
  • 5
  • 36
  • 45
0

The error you show is originating because special characters in regex need to be double escaped in R (e.g. \d should be \\d). I would suggest using sub for the regex portion so you can control the output, and adding wildcards (*) after the underscores to get matches if there is or is not an underscore (like your example shows).

Formatting in as.Date wants a capital Y (%Y) for year.

The updated code would be:

df <- mutate(df, Date = sub(pattern = ".*_(\\d{4})_*(\\d{2})_*(\\d{1,2}).*", "\\2/\\3/\\1", df$Filename))  
df$Date <- as.Date(df$Date,format = "%m/%d/%Y")
Evan
  • 1,960
  • 4
  • 26
  • 54