0

I have a line of sourcecode looking like this

 <img alt="this field is variable" title="this one too" itemprop="photo" border="0" style="width:608px;" src="imgurl.jpg">

There's lots of other images on the site, so i can't just preg_match all images, i need the specific one, i had a lot of trouble doing a specific preg_match, because content of the "alt"-tag and "title"-tag is variable. Anyone knows how to do so? Thanks in advance.

Itemprop="photo" is the thing unique for this picture.

Imbue
  • 399
  • 3
  • 14
  • Well obviously we cannot help if you don't tell us how to distinguish _this_ img tag from _others_! – arkascha Apr 10 '14 at 13:33
  • How does these tags stand out, what makes them unique? If they don't, are there anyway to make them? Do you have the ability to add something to them? – Nicolai Krüger Apr 10 '14 at 13:33
  • You could do [something like this](http://stackoverflow.com/questions/6651303/regex-match-img-tag-with-certain-attribute-class) but change class to itemprop otherwise [this is quite helpful for explaining regexes](http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string) – Pete Apr 10 '14 at 13:34
  • Yeah sorry for my bad description. The itemprop="photo" is the one that separates them from the other images in the file. And i have no ability to add anything unfortunately. – Imbue Apr 10 '14 at 13:34

2 Answers2

3

This regex should work:

preg_match('/<img[^>]*itemprop="photo"[^>]*src="([^"]+)">/',$source,$matches);

An explanation of the regex (from regex101):

Explanation of the regex

The result will be in the array $matches.

L3viathan
  • 26,748
  • 2
  • 58
  • 81
0

Using regex to parse HTML is not a good thing. Why not use DOMDocument to search for your elements? PHP has these objects for parsing through an HTML document and examining elements much easier than using regex to try to find them. You would also then be able to manipulate the HTML much easier depending on what it is that you are trying to accomplish.

$dom = new DOMDocument();
$dom->loadHTML(<your html string>);

$imgs = $dom->getElementsByTagName('img');
$photos = [];
foreach($imgs as $img) {
      if($img->attributes->getNamedItem('itemprop') && $img->attributes->getNamedItem('itemprop')->nodeValue = 'photo') {
         $photos[] = $img->attributes->getNamedItem('src')->nodeValue;
     }
}

This code will get you an array with the src attribute of the imgs that have your property and you are not dependent on how the elements are created or anything in the actual text of the html.

Community
  • 1
  • 1
Schleis
  • 41,516
  • 7
  • 68
  • 87
  • You shouldn't use regex to parse HTML in general (you can't), but if you have a clearly defined pattern, [regex can be the tool of choice](http://stackoverflow.com/a/1733489/1016216). – L3viathan Apr 11 '14 at 00:54