Use Regular Expression to retrieve Url in the row with more than one Url

Question

This is an example string.

<p style="text-align: center;"><a href="http://www.evangelical-library.org.uk" target="_blank"><img class="aligncenter wp-image-22582 size-full" src="http://the7.dream-demo.com/main/wp-content/uploads/sites/9/2014/05/show-04.png" alt="" width="372" height="225" /></a></p

There are two Url in a row

One is for PNG, the other is for a web page. I want to get the Png url like the pattern "http:.....png".

It simply uses "http://.*?png", but it retrieves a string from the first "http://" URL to the second Url with Png file extension.

I can now do it using the condition href and src to identify which belongs to Png url. But it will miss a lot of png urls with other patterns like <png>Png url</png>.

How could it be solved? Thanks.

[Don't parse HTML with regex!](http://stackoverflow.com/a/1732454/418066) — Biffen, Nov 17 '14 at 07:38

score 0 · Accepted Answer · answered Nov 17 '14 at 07:55

0

Uhmm, dont parse html with regex as Biffen commented on, but you can extract bits eg:

(?<=href=")[^"]+.png

will do a lookbehind for href=" at the start of the pattern, match every character that isn't a " until the .png at the end.

Spending an hour learning regex will save you time coming here.

answered Nov 17 '14 at 07:55

gwillie

1,893
1
12
14

Lookahead should be for `src`, not `href` – SBH Nov 17 '14 at 09:16

Use Regular Expression to retrieve Url in the row with more than one Url

1 Answers1