2

I want to extract a group of strings between two punctuations using RStudio.

I tried to use str_extract command, but whenever I tried to use anchors (^ for starting char, and $ for ending char), it failed.

Here is the sample problem:

> text <- "Name : Dr. CHARLES DOWNING MAP ; POB : London; Age/DOB : 53 years / August 05, 1958;"

Here is the sample code I used:

> str_extract(text,"(Name : )(.+)?( ;)")  
> str_match(str_extract(text,"(Name : )(.+)?( ;)"),"(Name : )(.+)?( ;)")[3]

But it seemed too verbose, and not flexible.

I only want to extract "Dr. CHARLES DOWNING MAP".

Anyone can help with my problem?

Can I tell the regex to start with any non-white-space character after "Name : " and ends before " ; POB"?

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
bhjxsb
  • 21
  • 2

2 Answers2

3

This seems to work.

> gsub(".*Name :(.*) ;.*", "\\1", text)
[1] " Dr. CHARLES DOWNING MAP"
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
1

With str_match

stringr::str_match(text, "^Name : (.*) ;")[, 2]
#[1] "Dr. CHARLES DOWNING MAP"

[, 2] is to get the contents from the capture group.


There is also qdapRegex::ex_between to extract string between left and right markers

qdapRegex::ex_between(text, "Name : ", ";")[[1]]
#[1] "Dr. CHARLES DOWNING MAP"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213