0

I am getting the content of a web-page using file_get_contents and part of the body tag remains in the output. I am also using strip_tags() to remove other html tags, but the partial body tag remains.

How can I remove it?

The output I am getting is body> and then content.

Here is my code:

$content = file_get_contents( $url );
$content = stristr( $content, "body" );
echo strip_tags($content);
IMUXIxD
  • 1,223
  • 5
  • 23
  • 44
  • What are you exactly about to do? Embedding the website into your page? Or do you want to extract special information from that page? – hek2mgl Apr 02 '13 at 16:22

1 Answers1

1

stristr returns starting from the index where the matched string starts, but you actually one one character after it ends:

$content = substr(strpos($content, "<body>") + strlen("<body>") + 1);

You also want to search for "" rather than "body", as "body" might appear in the actual content. Since you're using strip_tags anyway, however, you can actually grab starting at the beginning of the body tag and it will work just fine:

$content = stristr($content, "<body>");

This will return the content starting with <body>, which will be stripped off by strip_tags.

Adrian
  • 42,911
  • 6
  • 107
  • 99