0

I am trying to extract the most recent date that a report was added in an R dataframe of reports. The text always looks like Date Ordered: M/DD/YYYY and may contain 0 many times in a given report. If it's repeating, I want the most recent (usually the last) instance, and I'm trying to convert it to a date in a mutated dplyr column.

Using the code below on my actual dataframe, I get the error:

Error in if (nchar(s) > 0 && substring(s, 1, 1) == "\002") { :
missing value where TRUE/FALSE needed

However, it runs fine on a single item making me think that it's trying to concatenate the entire column.

The test code doesn't give an error, but actually pulls the last date from the last report for all instances:

     lastdate
1 1999-04-15
2 1999-04-15
dataset=data.frame(cbind(ID=c(001,002),
                         Report=c("Blah Blah Date Ordered: 5/19/2000 test is positive. Date Ordered: 4/2/2005 Additional testing negative.",
                                  "Meh Date Ordered: 4/15/1999")),
                   stringsAsFactors = F)`

dataset %>% 
  mutate(lastdate = as.Date(last(gsub("Date Ordered:\\s+", "",
                                      strapplyc(Report, 
                                                "Date Ordered:\\s*\\d+/\\d+/\\d+", simplify = TRUE))),
                            "%m/%d/%Y"))

Desired output should be:

2005-4-2
1999-4-15

Actual for dataset:

Error in if (nchar(s) > 0 && substring(s, 1, 1) == "\002") { : 
  missing value where TRUE/FALSE needed

Actual for test data:

    lastdate
1 1999-04-15
2 1999-04-15
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
sm002
  • 101
  • 1
  • 10
  • Try `dataset$lastsdate <- as.Date(gsub(".*Date Ordered:\\s*(\\d{1,2}/\\d{1,2}/\\d{4}).*","\\1", dataset$Report),"%m/%d/%Y")` – Wiktor Stribiżew Sep 28 '19 at 21:19
  • I actually had to wrap it in an ifelse as occasional lines were resulting in entire sentences and cbreaking the as.Date, and one of the "Date Ordered" elements was blank, which was also problematic. This worked: Lastdate=ifelse(grepl("* Addendum *",Report) & grepl("Date Complete",Report),gsub(".*Date Complete:\\s*(\\d{1,2}/\\d{1,2}/\\d{4}).*","\\1", Report) ,NA), LastAddendum=as.Date(LastAddendum,,"%m/%d/%Y") – sm002 Oct 01 '19 at 22:58
  • You may get rid of the whole text if you add `|.*`. See my answer. – Wiktor Stribiżew Oct 01 '19 at 23:09

1 Answers1

0

I suggest a gsub like

dataset$lastsdate <- as.Date(gsub(".*Date Ordered:\\s*(\\d{1,2}/\\d{1,2}/\\d{4}).*|.*","\\1", dataset$Report),"%m/%d/%Y")

See the regex in action.

The regex matches:

  • .* - any 0+ chars as many as possible
  • Date Ordered: - a literal substring
  • \s* - 0+ whitespaces
  • (\d{1,2}/\d{1,2}/\d{4}) - Capturing group 1 (\1): 1 or 2 digits, /, 1 or 2 digits, /, 4 digits
  • .* - the rest of the string
  • | - or
  • .* - the entire string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563