-1

I use a rich text editor in my android app, which works by parsing rich text to HTML.

But now, I want to fetch abstract containing plain text and some images from those HTMLs, so I decide to extract the plain text and images on server side with PHP. At the beginning, I'm trying to do it by regex (should be very complex), but it seems too hard for an embedded engineer.

Could anyone give me some suggestions?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
yuan tian
  • 41
  • 8

2 Answers2

0

You should avoid using regexes to parse HTML (see How do you parse and process HTML/XML in PHP? or Using regular expressions to parse HTML: why not?). Consider using a PHP HTML parsing library such as:

Example

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';
Community
  • 1
  • 1
GeorgeQ
  • 1,382
  • 10
  • 8
0

Very thanks to alashow who added an example associated with simplehtmldom(3rd-party library),I used the library in my project and works very well except for running a little bit slower.

fetching all the plain text in HTML,just one line!

 $keyDetailHTML = str_get_html($keyDetailXMLString);
 $keyTextString=$keyDetailHTML ->plaintext;

fetching all the img is just like the code alashow show.

yuan tian
  • 41
  • 8