0

So I'm building a simple HTML scraper, in the text there are some images for the symbols and what I need is to get the src of this img tags, but keep them in the same place as the img tags where in the text.

--question rewritten due to me being asking the wrong thing.

  • 1
    If you're writing an HTML scraper, why aren't you using an HTML DOM parser instead of regexp? – Barmar Aug 01 '13 at 18:33
  • HTML can't be parsed well with regExp, use HTML parser (read http://stackoverflow.com/a/1732454/1529630) – Oriol Aug 01 '13 at 18:33
  • Please don't use anything beyond an html traversal library to parse html, this community hates that more than anything. – Korvin Szanto Aug 01 '13 at 18:33
  • I'm using everythin with HTML DOM, but this part was giving me trouble because I need to know where in the text each image is, and I failed to do it with DOM –  Aug 01 '13 at 18:37
  • If you already use a DOM parser, it should be able to extract the src attribute only. Perhaps if you asked about *that* instead, it might solve the *actual* problem faster. – JJJ Aug 01 '13 at 18:39
  • Well sry for writing the question in the wrong way, but if I extract the src attribute, how do I get it to show in correct part of the text? –  Aug 01 '13 at 18:43

3 Answers3

0

Well, I'd really suggest using a proper HTML parsing library to do this, but if it's really simple you can get away with a regex like /<img[^>]*alt="(\d+?)"/ or so.

Troy
  • 1,599
  • 14
  • 28
0

if you want to get the name= part from the src attribute which is /Handlers/Image.ashx?size=small&amp;name=3&amp;type=symbol you can try this

<?php
$src = '/Handlers/Image.ashx?size=small&amp;name=3&amp;type=symbol';
$x = preg_replace('/^.*name=([^&]*).*$/i','[$1]',$src);
echo $x;
?>
bansi
  • 55,591
  • 6
  • 41
  • 52
0

Since you're using DOM parser so you can extract image's src attribute in a variable. After that use following code:

$imgsrc = '/Handlers/Image.ashx?size=small&amp;name=3&amp;type=symbol';
$url = parse_url($imgsrc);
$query = htmlspecialchars_decode($url['query']);
parse_str($query, $arr);
echo $arr['name'] . "\n"; // prints 3
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • ok I get this, but how can I make sure it displayed at the same place where the original image was? –  Aug 01 '13 at 19:17
  • You can replace the image tag completely using [`DOMNode::replaceChild`](http://at.php.net/manual/en/domnode.replacechild.php) – anubhava Aug 01 '13 at 19:27
  • I'm going to be out for a couple hours, will test it when I get back, ty so much. –  Aug 01 '13 at 19:30
  • That was a lot harder than I though it would be, but I managed to get it working, ty again. –  Aug 01 '13 at 22:52
  • You're welcome, glad that you got it working. – anubhava Aug 02 '13 at 03:55