0

I tried to extract URL but everytime I run my code. It didn't work. What did miss? any help will be great.

x$URL <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$Product.Description.)

[1] //2//3

It was what I return. I want to get http://www.blooming.com/image/xxxxxxxx.jpg in return from below vector.

<div>Colorful Floor chair Series</div><div><br /></div><div>Soft
Suede</div><div><br /></div><div>Cute bubble design</div><div><br
/></div><div><p align="center"><p align="center"><img
src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.GIF"
/></div><div><p align="center"><p align="center"><img
src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div>
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
HoKyun
  • 1
  • 4
    uh oh. regex with html? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – RJ- Jan 21 '16 at 05:51
  • Not really the same case: this is about matching URLs inside HTML, not about matching HTML tags (for which the linked response is appropriate). – legoscia Jan 21 '16 at 08:33

1 Answers1

3
  1. Backreferences must be refered by backslash no forward slash.

  2. Use .*? (non-greedy) to match all the characters which exists inbetween .com and the file extension .jpg

    x$URL <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*",
                                  "\\1", x$Product.Description.) 
    

DEMO

akrun
  • 874,273
  • 37
  • 540
  • 662
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274