0

How to extract data between strings "".

I have following string data as example:

x <- c('"Apr 21 2020 16:45        10894 <A HREF=\"D188_2020-03-30.csv\">D188_2020-03-30.csv</A>"')

would like to extract D188_2020-03-30.csv\ as a output.

Have refer various gsub example but unable to figure out.

Appreciate any suggestions.

Tushar Lad
  • 490
  • 1
  • 4
  • 17

2 Answers2

2

There are multiple strings between "", so you need some another identifier to extract what you want. Maybe try string between "" after "HREF".

sub('.*HREF="(.*?)".*', '\\1', x)
#[1] "D188_2020-03-30.csv"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

Here is another alternative using the str_extract function from the stringr package.

str_extract(string = x, pattern = ("(?<=HREF=\").*(?=.>D188)"))

This basically returns the text by looking behind to match (?<=HREF=\") and looking ahead to match (?=.>D188)") .

# [1] "D188_2020-03-30.csv"
Sri Sreshtan
  • 535
  • 3
  • 12