part of body tag remains in file_get_contents output, how to remove it

Question

I am getting the content of a web-page using file_get_contents and part of the body tag remains in the output. I am also using strip_tags() to remove other html tags, but the partial body tag remains.

How can I remove it?

The output I am getting is body> and then content.

Here is my code:

$content = file_get_contents( $url );
$content = stristr( $content, "body" );
echo strip_tags($content);

What are you exactly about to do? Embedding the website into your page? Or do you want to extract special information from that page? — hek2mgl, Apr 02 '13 at 16:22

score 1 · Accepted Answer · answered Apr 02 '13 at 16:22

stristr returns starting from the index where the matched string starts, but you actually one one character after it ends:

$content = substr(strpos($content, "<body>") + strlen("<body>") + 1);

You also want to search for "" rather than "body", as "body" might appear in the actual content. Since you're using strip_tags anyway, however, you can actually grab starting at the beginning of the body tag and it will work just fine:

$content = stristr($content, "<body>");

This will return the content starting with <body>, which will be stripped off by strip_tags.

part of body tag remains in file_get_contents output, how to remove it

1 Answers1