0

I want to extract all text data from website and store that data into file for further process. I am using Curl library for this purpose. How can i extract only text from website using php. Please any one guide me i am extremely beginner.

2 Answers2

0

You can get the text data from website by using scraping tools

ChandraShekar
  • 386
  • 5
  • 12
0

You can function like below:

<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

  preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
  $tags = array_unique($tags[1]);

  if(is_array($tags) AND count($tags) > 0) {
    if($invert == FALSE) {
      return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif($invert == FALSE) {
    return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return $text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>

Copied from: https://www.php.net/manual/en/function.strip-tags.php#86964

Suresh Kamrushi
  • 15,627
  • 13
  • 75
  • 90