0

I'm trying to fetch my articles and I need to make a slider out of them.

Each of my articles has an image inside it's text, like this:

<p>
<img src="story_img.jpg" width=120 height=80>
In the last couple of weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.
</p>

Now what I need is simple, first I should parse the image and then remove it from the text .

So that I could have 2 constants

$imgOfText = ?

$TextWithOutImg = ?

I tried different ways in php and even read this topic.

But I couldn't do that.

Community
  • 1
  • 1
Mac Taylor
  • 5,020
  • 14
  • 50
  • 73

4 Answers4

3

It's HTML so you can parse it ! Use DomDocument !

$html = '<p>';
$html.= '<img src="story_img.jpg" width=120 height=80>';
$html.= 'In the last couple of weeks I often had to download a lot ';
$html.= 'of files, submitted to a web-based teaching platform. Downloading ';
$html.= 'all these files by hand is very annoying so I implemented a short ';
$html.= 'Groovy script. Since Groovy has a great support for parsing well-';
$html.= 'formed XML-like information it fails if you want to parse ';
$html.= 'unstructured and nasty HTML code.';
$html.= '</p>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$p = $doc->getElementsByTagName('p')->item(0);
$img = $doc->getElementsByTagName('img')->item(0);
$imgOfText = $img->getAttribute('src');
$TextWithOutImg = $p->nodeValue;

Demo here

Shikiryu
  • 10,180
  • 8
  • 49
  • 75
  • Call to a member function getAttribute() on a non-object this is what i get when i run the script . i saw the demo and its working there but not for me – Mac Taylor Feb 23 '11 at 10:11
  • @Mac Taylor : Did you copy/paste the demo on your server or did you change something (for example, the HTML)? If so, can you tell me what change you did ? – Shikiryu Feb 23 '11 at 10:33
  • i think its not working when im using it in a while loop e.g. : while ($row = $db->sql_fetchrow($result)) { $html = $row["content"]; – Mac Taylor Feb 27 '11 at 05:48
  • @Mac Taylor : maybe you should display what $row['content'] contains. It may be escaped / bad encoded – Shikiryu Mar 10 '11 at 08:19
2

How about this Live Demo I whipped up. It's just some very basic parsing using strpos(). Im sure this could be done with regular expressions, but I never was any good at that :)

CODE

<?php

    $html = '<p>';
    $html.= '    <img src="story_img.jpg" width=120 height=80>';
    $html.= '    In the last couple of weeks I often had to download a lot ';
    $html.= 'of files, submitted to a web-based teaching platform. Downloading ';
    $html.= 'all these files by hand is very annoying so I implemented a short ';
    $html.= 'Groovy script. Since Groovy has a great support for parsing well-';
    $html.= 'formed XML-like information it fails if you want to parse ';
    $html.= 'unstructured and nasty HTML code.';
    $html.= '</p>';

    $spot = strpos($html, 'src="', strpos($html, '<img'))+5;
    $spot2 =strpos($html, '"', $spot);
    $imgOfText = substr($html, $spot, $spot2-$spot);

    $spot = strpos($html, '<img');
    $spot2 = strpos($html, '>', $spot)+1;
    $TextWithOutImg = substr($html,0,$spot).substr($html,$spot2);

    echo "Image Source: ".$imgOfText."\n\n";
    echo "Text Without Image:\n".$TextWithOutImg;

?>

OUTPUT

Image Source: story_img.jpg

Text Without Image:

<p>In the last couple of weeks I often had to download a lot of files, submitted to a web-based teaching platform. Downloading all these files by hand is very annoying so I implemented a short Groovy script. Since Groovy has a great support for parsing well-formed XML-like information it fails if you want to parse unstructured and nasty HTML code.</p>

Dutchie432
  • 28,798
  • 20
  • 92
  • 109
  • What if, one day, he finally closes his `` tag ? :) – Shikiryu Feb 22 '11 at 22:28
  • @Capsule & @Mac Taylor : What if you use `>` in your text? What if your image is embedded like this : ``? etc...etc... You should use regex or regex shortcut to find things in HTML. – Shikiryu Feb 23 '11 at 07:10
  • @Shikiryu: You can use `>` in the text with no problem as the code is looking for the first `>` after ` – Dutchie432 Oct 12 '11 at 10:56
  • @Dutchie432 : I don't see any problem with my code since DomDocument takes `` and `` and I meant *shouldn't*. You don't use regex for HTML. Your solution is better than regex but it assumes a lot of thing which could go wrong. (ie : `

    text

    – Shikiryu Oct 12 '11 at 15:14
1

Try this topic: PHP - remove <img> tag from string

Community
  • 1
  • 1
zachallia
  • 1,495
  • 12
  • 16
1

There is a number of PHP libraries that can parse HTML, even invalid one.

PHPQuery

Simple HTML DOM

Zend DOM Query

Here is a PHPQuery example that prints all img tags appear on StackOverflow home page.

<?php

$html = file_get_contents('http://stackoverflow.com');

include('phpQuery.php');

$pq = phpQuery::newDocumentHTML($html, 'utf-8');

foreach ($pq->find('img') as $img)
{
    echo pq($img)->attr('src') .'<br>';

}

?>

Another example that extracts text of all paragraphs:

foreach ($pq->find('p') as $p)
{
    echo pq($p)->text() .'<br>';

}
Sergey Kornilov
  • 1,772
  • 2
  • 13
  • 22
  • Well, yeah, why importing a library to extract it when a _native_ (almost) one can do it? That's laziness, like using jQuery to get an element id. – Shikiryu Feb 23 '11 at 07:07
  • Shikiryu, if you need to parse HTML that is not well formed you are out of luck while these libraries can do that – Sergey Kornilov Mar 10 '11 at 05:45
  • Well, that isn't true. DOMDocument can parse malformed HTML, that's why it is powerful. Those libraries only got the advantage of being nicer to read and simpler to use for me. – Shikiryu Mar 10 '11 at 08:26