extracting image tag from a html code in php

Question

I'm trying to fetch my articles and I need to make a slider out of them.

Each of my articles has an image inside it's text, like this:

<p>
<img src="story_img.jpg" width=120 height=80>
In the last couple of weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.
</p>

Now what I need is simple, first I should parse the image and then remove it from the text .

So that I could have 2 constants

$imgOfText = ?

$TextWithOutImg = ?

I tried different ways in php and even read this topic.

But I couldn't do that.

score 3 · Accepted Answer · answered Feb 22 '11 at 22:35

3

It's HTML so you can parse it ! Use DomDocument !

$html = '<p>';
$html.= '<img src="story_img.jpg" width=120 height=80>';
$html.= 'In the last couple of weeks I often had to download a lot ';
$html.= 'of files, submitted to a web-based teaching platform. Downloading ';
$html.= 'all these files by hand is very annoying so I implemented a short ';
$html.= 'Groovy script. Since Groovy has a great support for parsing well-';
$html.= 'formed XML-like information it fails if you want to parse ';
$html.= 'unstructured and nasty HTML code.';
$html.= '</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$p = $doc->getElementsByTagName('p')->item(0);
$img = $doc->getElementsByTagName('img')->item(0);
$imgOfText = $img->getAttribute('src');
$TextWithOutImg = $p->nodeValue;

Demo here

answered Feb 22 '11 at 22:35

Shikiryu

10,180
8
49
75

Call to a member function getAttribute() on a non-object this is what i get when i run the script . i saw the demo and its working there but not for me – Mac Taylor Feb 23 '11 at 10:11
@Mac Taylor : Did you copy/paste the demo on your server or did you change something (for example, the HTML)? If so, can you tell me what change you did ? – Shikiryu Feb 23 '11 at 10:33
i think its not working when im using it in a while loop e.g. : while ($row = $db->sql_fetchrow($result)) { $html = $row["content"]; – Mac Taylor Feb 27 '11 at 05:48
@Mac Taylor : maybe you should display what $row['content'] contains. It may be escaped / bad encoded – Shikiryu Mar 10 '11 at 08:19

Dutchie432 · Answer 2 · 2011-02-22T21:07:16.243

How about this Live Demo I whipped up. It's just some very basic parsing using strpos(). Im sure this could be done with regular expressions, but I never was any good at that :)

CODE

<?php

    $html = '<p>';
    $html.= '    <img src="story_img.jpg" width=120 height=80>';
    $html.= '    In the last couple of weeks I often had to download a lot ';
    $html.= 'of files, submitted to a web-based teaching platform. Downloading ';
    $html.= 'all these files by hand is very annoying so I implemented a short ';
    $html.= 'Groovy script. Since Groovy has a great support for parsing well-';
    $html.= 'formed XML-like information it fails if you want to parse ';
    $html.= 'unstructured and nasty HTML code.';
    $html.= '</p>';

    $spot = strpos($html, 'src="', strpos($html, '<img'))+5;
    $spot2 =strpos($html, '"', $spot);
    $imgOfText = substr($html, $spot, $spot2-$spot);

    $spot = strpos($html, '<img');
    $spot2 = strpos($html, '>', $spot)+1;
    $TextWithOutImg = substr($html,0,$spot).substr($html,$spot2);

    echo "Image Source: ".$imgOfText."\n\n";
    echo "Text Without Image:\n".$TextWithOutImg;

?>

OUTPUT

Image Source: story_img.jpg

Text Without Image:

<p>In the last couple of weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.</p>

@Capsule & @Mac Taylor : What if you use `>` in your text? What if your image is embedded like this : ``? etc...etc... You should use regex or regex shortcut to find things in HTML. — Shikiryu, Feb 23 '11 at 07:10
@Shikiryu: You can use `>` in the text with no problem as the code is looking for the first `>` after ` — Dutchie432, Oct 12 '11 at 10:56
@Dutchie432 : I don't see any problem with my code since DomDocument takes `` and `` and I meant *shouldn't*. You don't use regex for HTML. Your solution is better than regex but it assumes a lot of thing which could go wrong. (ie : `
text — Shikiryu, Oct 12 '11 at 15:14

score 1 · Answer 3 · edited May 23 '17 at 12:10

1

Try this topic: PHP - remove <img> tag from string

edited May 23 '17 at 12:10

Community

1
1

answered Feb 22 '11 at 20:40

zachallia

1,495
12
16

score 1 · Answer 4 · answered Feb 22 '11 at 23:18

1

There is a number of PHP libraries that can parse HTML, even invalid one.

PHPQuery

Simple HTML DOM

Zend DOM Query

Here is a PHPQuery example that prints all img tags appear on StackOverflow home page.

<?php

$html = file_get_contents('http://stackoverflow.com');

include('phpQuery.php');

$pq = phpQuery::newDocumentHTML($html, 'utf-8');

foreach ($pq->find('img') as $img)
{
    echo pq($img)->attr('src') .'<br>';

}

?>

Another example that extracts text of all paragraphs:

foreach ($pq->find('p') as $p)
{
    echo pq($p)->text() .'<br>';

}

answered Feb 22 '11 at 23:18

Sergey Kornilov

1,772
2
13
22

Well, yeah, why importing a library to extract it when a _native_ (almost) one can do it? That's laziness, like using jQuery to get an element id. – Shikiryu Feb 23 '11 at 07:07
Shikiryu, if you need to parse HTML that is not well formed you are out of luck while these libraries can do that – Sergey Kornilov Mar 10 '11 at 05:45
Well, that isn't true. DOMDocument can parse malformed HTML, that's why it is powerful. Those libraries only got the advantage of being nicer to read and simpler to use for me. – Shikiryu Mar 10 '11 at 08:26

extracting image tag from a html code in php

4 Answers4

Related