-1

I have this DOM in my site:

$question_data = '<p>Gambar di bawah menunjukkan ciri-ciri haiwan berikut:</p>

<p><img src="/uploads/images/questions/96_20160303124007.PNG" /></p>

 <table border="1" cellpadding="1" cellspacing="1" style="width: 500px;">
    <tbody>
        <tr>
            <td style="text-align: center;">Beranak</td>
            <td>
            <p style="text-align: center;">Bertelur</p>
            </td>
        </tr>
    </tbody>
 </table> 
 <p>##a</p>';

This my REGEX to filter 96_20160303124007.PNG:

define('GET_IMAGE_NAME_WITH_EXTENSION_PATTERN','/<img .*?src=(?:['\"])[^\"]*\/\K(.*?\.(?:jpeg|jpg|bmp|gif|png))(?:['\"]).*?>/');

$pattern = GET_IMAGE_NAME_WITH_EXTENSION_PATTERN;          

$arr_image_file_names = array();
preg_match_all($pattern, $question_data, $arr_image_file_names);

But got no output... anyone knows how to solve this?

Nere
  • 4,097
  • 5
  • 31
  • 71
  • 2
    [Required reading.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Daedalus Mar 29 '16 at 06:29

2 Answers2

2

It's common knowledge that regex is not the right tool for HTML parsing.

A solution without using regular expression:

$xml = new DOMDocument();
$xml->loadHTML($question_data);

$imgNodes = $xml->getElementsByTagName('img');

$arr_image_file_names = [];
for ($i = $imgNodes->length - 1; $i >= 0; $i--) {
    $imgNode = $imgNodes->item($i);
    $arr_image_file_names[] = pathinfo($imgNode->getAttribute('src), PATHINFO_BASENAME);
}
Community
  • 1
  • 1
Yann Milin
  • 1,335
  • 1
  • 11
  • 22
1

change your constant to:

define('GET_IMAGE_NAME_WITH_EXTENSION_PATTERN', '/\d+_\d+\.(?:jpeg|jpg|bmp|gif|png)/i');
Mihai Matei
  • 24,166
  • 5
  • 32
  • 50