I have a regular expression that is able to match my data, using grepl
, but I can't figure out how to extract the sub-expressions inside it to new columns.
This is returning the test string as foo
, without any of the sub-expressions:
entryPattern <- "(\\d+)\\s+([[:lower:][:blank:]-]*[A-Z][[:alpha:][:blank:]-]+[A-Z]\\s[[:alpha:][:blank:]]+)\\s+([A-Z]{3})\\s+(\\d{4})\\s+(\\d\\d\\-\\d\\d)\\s+([[:print:][:blank:]]+)\\s+(\\d*\\:?\\d+\\.\\d+)"
test <- "101 POULET Laure FRA 1992 25-29 E. M. S. Bron Natation 26.00"
m <- regexpr(entryPattern, test)
foo <- regmatches(test, m)
In my real use case, I'm acting on lots of strings similar to test
. I'm able to find the correctly formatted ones, so I think the pattern is correct.
rows$isMatch <- grepl(entryPattern, rows$text)
What 'm hoping to do is add the sub-expressions as new columns in the rows dataframe (i.e. rows$rank
, rows$name
, rows$country
, etc.). Thanks in advance for any advice.