0

This is an example string.

<p style="text-align: center;"><a href="http://www.evangelical-library.org.uk" target="_blank"><img class="aligncenter wp-image-22582 size-full" src="http://the7.dream-demo.com/main/wp-content/uploads/sites/9/2014/05/show-04.png" alt="" width="372" height="225" /></a></p

There are two Url in a row

One is for PNG, the other is for a web page. I want to get the Png url like the pattern "http:.....png".

It simply uses "http://.*?png", but it retrieves a string from the first "http://" URL to the second Url with Png file extension.

I can now do it using the condition href and src to identify which belongs to Png url. But it will miss a lot of png urls with other patterns like <png>Png url</png>.

How could it be solved? Thanks.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156

1 Answers1

0

Uhmm, dont parse html with regex as Biffen commented on, but you can extract bits eg:

(?<=href=")[^"]+.png

will do a lookbehind for href=" at the start of the pattern, match every character that isn't a " until the .png at the end.

Spending an hour learning regex will save you time coming here.

gwillie
  • 1,893
  • 1
  • 12
  • 14