gsub error extract url with R, what did i miss

Question

I tried to extract URL but everytime I run my code. It didn't work. What did miss? any help will be great.

x$URL <- gsub("(.*)(http://www.bloomin.com)(.jpg)(.)",
"//2//3", x$Product.Description.)

[1] //2//3

It was what I return. I want to get http://www.blooming.com/image/xxxxxxxx.jpg in return from below vector.

<div>Colorful Floor chair Series</div><div><br /></div><div>Soft
Suede</div><div><br /></div><div>Cute bubble design</div><div><br
/></div><div><p align="center"><p align="center"><img
src="http://gdetail.image-gemkt.com/186/716088198/2010/2/e3b117e2-a7bd-4d.GIF"
/></div><div><p align="center"><p align="center"><img
src="http://www.blooming.com/image/xxxxxxxx.jpg" /></div>

uh oh. regex with html? http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — RJ-, Jan 21 '16 at 05:51
Not really the same case: this is about matching URLs inside HTML, not about matching HTML tags (for which the linked response is appropriate). — legoscia, Jan 21 '16 at 08:33

score 3 · Answer 1 · edited Jan 21 '16 at 06:12

3

Backreferences must be refered by backslash no forward slash.

Use .*? (non-greedy) to match all the characters which exists inbetween .com and the file extension .jpg

x$URL <- gsub("(?s).*\\b(http://www\\.blooming\\.com\\b.*?\\.jpg\\b).*",
                              "\\1", x$Product.Description.)

DEMO

edited Jan 21 '16 at 06:12

akrun

874,273
37
540
662

answered Jan 21 '16 at 05:50

Avinash Raj

172,303
28
230
274

You saved me! Thanks a lot – HoKyun Jan 21 '16 at 06:02

gsub error extract url with R, what did i miss

1 Answers1