2

I want to get src in image based on class or id. Ex. On html page there are many <img src="url"> but only one have a class or id: <img src="url" class="image" or id="image"> How to get right src attribute wich have a specific class or id? Pls regex not dom

I gonna explain you why I dont want to use dom or other libraries because I'm getting a html page from an other site which not allow fopen or _file_get_contents or DOM but only Curl could do this. Sure I have a reason why I not use these libraries like simplehtmldom because sometimes is impossible to get remote html page and I should make by myself some scripts.

goni
  • 53
  • 2
  • 6
  • 7
    DOM is the right tool for this job. – lonesomeday Jun 05 '11 at 22:36
  • ok I gonna explain you why I dont want to use dom or other libraries because I'm getting a html page from an other site which not allow fopen or _file_get_contents or DOM but only Curl could do this. Sure I have a reason why I not use these libraries like simplehtmldom because sometimes is impossible to get remote html page and I should make by myself some scripts. – goni Jun 05 '11 at 22:53

2 Answers2

7

You say that you don't want to use DOM libraries because you need to use cURL. That's fine -- DOMDocument and simple_xml_load_string both take string arguments. So you can get your string from cURL and load it into your DOM library.

For instance:

$html = curl_exec($ch); // assuming CURLOPT_RETURNTRANSFER

$dom = new DOMDocument;
$dom->loadHTML($html); // load the string from cURL into the DOMDocument object

// using an ID
$el = $dom->getElementById('image');

// using a class
$xpath = new DOMXPath($dom);
$els = $xpath->query('//img[@class="image"]');
$el = $els->item(0);

$src = $el->getAttribute('src');
lonesomeday
  • 233,373
  • 50
  • 316
  • 318
4

if you absolutely have to use regex, here it is

<img(?:[^>]+src="(.+?)"[^>]+(?:id|class)="image"|[^>]+(?:id|class)="image"[^>]+src="(.+?)")

That said, the right way to do it is to use jQuery or a similar DOM-parsing technique. Don't use the regex unless you have a very good reason to because it will miss many cases (for example, it won't work if single quotes are used instead of double quotes or if there are spaces before "image").

Evgeny Shadchnev
  • 7,320
  • 4
  • 27
  • 30
  • 1
    because doesn't exists way for remote html page to do this. sometimes some sites are not reachable with fopen or file_get_contents – goni Jun 05 '11 at 23:00
  • But if you have html source in memory, you should be able to give it to the DOM parser of your choice. I'm sorry, I'm not a PHP dev, so I'm not too familiar with specific ones but I'm sure they exist. – Evgeny Shadchnev Jun 05 '11 at 23:06