0

Possible Duplicate:
How to extract img src, title and alt from html using php?

I am trying to parse a webpage and get the full images. E.g

<html>

<body>
<a href='1.jpg'><img src='tn1.jpg /></a>
<a href='2.jpg'><img src='tn2.jpg /></a>
<a href='3.jpg'><img src='tn3.jpg /></a>
<a href='4.jpg'><img src='tn4.jpg /></a>
</body>
</html>

So I am trying to capture them to get the full thumbnails which are:

1.jpg
2.jpg
3.jpg
4.jpg

My PHP Regex code is:

$text = file_get_contents($website); //Get webpage

preg_match_all("~$[0-9](.*?)\.jpg~i", $text, $matches);

But when I run it, the $matches array is empty yet the paths are there in the webiste. What might be wrong with my Regex?

INFO: All images follow a pattern and are all a number followed by .jpg

Community
  • 1
  • 1
William The Dev
  • 505
  • 1
  • 6
  • 16
  • 1
    This has been asked many times. Try http://stackoverflow.com/questions/138313/how-to-extract-img-src-title-and-alt-from-html-using-php or http://stackoverflow.com/questions/2120779/regex-php-isolate-src-attribute-from-img-tag or http://stackoverflow.com/questions/11406453/how-to-get-link-from-img-tag – Oldskool Dec 16 '12 at 12:47

1 Answers1

2

I don't recommend the usage of regex to parse html, but if you don't want to do this properly, here's a regex that does the trick: /(?<=['"])\d+\.jpg/i

Demo here: http://regex101.com/r/xC8nP2

The problem with your regex is the misuse of the $-anchor.

Firas Dib
  • 2,743
  • 19
  • 38