R:how to get grep to return the match, rather than the whole string

Question

I have what is probably a really dumb grep in R question. Apologies, because this seems like it should be so easy - I'm obviously just missing something.

I have a vector of strings, let's call it alice. Some of alice is printed out below:

T.8EFF.SP.OT1.D5.VSVOVA#4   
T.8EFF.SP.OT1.D6.LISOVA#1  
T.8EFF.SP.OT1.D6.LISOVA#2   
T.8EFF.SP.OT1.D6.LISOVA#3  
T.8EFF.SP.OT1.D6.VSVOVA#4    
T.8EFF.SP.OT1.D8.VSVOVA#3  
T.8EFF.SP.OT1.D8.VSVOVA#4   
T.8MEM.SP#1                
T.8MEM.SP#3                      
T.8MEM.SP.OT1.D106.VSVOVA#2 
T.8MEM.SP.OT1.D45.LISOVA#1  
T.8MEM.SP.OT1.D45.LISOVA#3

I'd like grep to give me the number after the D that appears in some of these strings, conditional on the string containing "LIS" and an empty string or something otherwise.

I was hoping that grep would return me the value of a capturing group rather than the whole string. Here's my R-flavoured regexp:

pattern <- (?<=\\.D)([0-9]+)(?=.LIS)

nothing too complicated. But in order to get what I'm after, rather than just using grep(pattern, alice, value = TRUE, perl = TRUE) I'm doing the following, which seems bad:

reg.out <- regexpr(
    "(?<=\\.D)[0-9]+(?=.LIS)",
    alice,
    perl=TRUE
)
substr(alice,reg.out,reg.out + attr(reg.out,"match.length")-1)

Looking at it now it doesn't seem too ugly, but the amount of messing about it's taken to get this utterly trivial thing working has been embarrassing. Anyone any pointers about how to go about this properly?

Bonus marks for pointing me to a webpage that explains the difference between whatever I access with $,@ and attr.

looks like this has already been asked, and answered. Apologies for the repetition! http://stackoverflow.com/questions/2192316/extract-a-regular-expression-match-in-r-version-2-10/2192732#2192732 — Mike Dewar, Jun 04 '10 at 01:05

score 60 · Answer 1 · answered Jun 03 '10 at 22:47

60

Try the stringr package:

library(stringr)
str_match(alice, ".*\\.D([0-9]+)\\.LIS.*")[, 2]

answered Jun 03 '10 at 22:47

hadley

102,019
32
183
245

brilliant. Don't suppose there are plans for stringr to use perl regexps? Or is it generally the case that one should always use R's dialect? – Mike Dewar Jun 04 '10 at 00:58
@Mike you can use perl regexps in `stringr` by wrapping the regex string in `perl()`. See `?perl`. – Sam Firke May 08 '15 at 13:43
@SamFirke not any more – hadley May 08 '15 at 20:39
@hadley this works for me: `str_extract("20004ABCreturnthispartDE", perl("(?<=ABC)(.*)(?=DE)"))` but not without the `perl()`. – Sam Firke May 08 '15 at 20:47
You don't have the latest version of stringr (1.0.0) – hadley May 11 '15 at 14:40
2

@SamFirke now uses "regex" instead of "perl" string – Ferroao Jun 09 '17 at 17:50

Ken Williams · Accepted Answer · 2017-12-10T04:17:44.217

39

You can do something like this:

pat <- ".*\\.D([0-9]+)\\.LIS.*"
sub(pat, "\\1", alice)

If you only want the subset of alice where your pattern matches, try this:

pat <- ".*\\.D([0-9]+)\\.LIS.*"
sub(pat, "\\1", alice[grepl(pat, alice)])

edited Dec 10 '17 at 04:17

answered Jun 03 '10 at 20:49

Ken Williams

22,756
10
85
147

awesome. Thanks so much. I hadn't thought of replacing the line with the match, rather I was obsessively thinking "why on earth won't it return me the match arggghhh!". I should probably stop using those lookahead and lookbehind things as well eh? My brain doesn't work with regexp very well yet. I seem to think backwards. – Mike Dewar Jun 03 '10 at 21:16
I just came across the regex feature Ken Williams used above. It is amazing. I believe it is called tagging. One can write a regex and one or more parts of it can be placed in unescaped brackets / parentheses. The sub or gsub function allows one to use \\1 or \\2 to paste in the match that was in the first pair of unescaped brackets or the second pair of unescaped brackets respectively. Read more here http://books.google.com/books?id=grfuq1twFe4C&lpg=PP1&pg=PA99#v=onepage&q=unescaped&f=false – Farrel Jun 11 '10 at 11:22

R:how to get grep to return the match, rather than the whole string

2 Answers2

Linked