-1

I've been confused. So here's my problem, I have a text like this :

<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.

I'm try to get the value inside <ORGANIZATION> tag using my code :

function get_text_between_tags($string, $tagname) {
    $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    if(!empty($matches[1]))
        return $matches[1];
}

But this code only retrieve one value from the last tag (officials of IPB) when there are 3 tags <ORGANIZATION>.

Now, I don't have idea to modify this code to get all value inside tag without duplication. So please help, thanks in advance. :D

andrefadila
  • 647
  • 2
  • 9
  • 36

2 Answers2

4

preg_match will only return the first match, and your current code will fail if:

  • The tag is not uppercased in the same way
  • The tag's contents are on more than one line
  • There are more than one of the tag on the same line.

Instead, try this:

function get_text_between_tags($string, $tagname) {
    $pattern = "/<$tagname\b[^>]*>(.*?)<\/$tagname>/is";
    preg_match_all($pattern, $string, $matches);
    if(!empty($matches[1]))
        return $matches[1];
    return array();
}

This is acceptable use of regexes for parsing, because it is a clearly-defined case. Note however that it will fail if, for whatever reason, there is a > inside an attribute value of the tag.

If you prefer to avoid the wrath of the pony, try this:

function get_text_between_tags($string, $tagname) {
    $dom = new DOMDocument();
    $dom->loadHTML($string);
    $tags = $dom->getElementsByTagName($tagname);
    $out = array();
    $length = $tags->length;
    for( $i=0; $i<$length; $i++) $out[] = $tags->item($i)->nodeValue;
    return $out;
}
Community
  • 1
  • 1
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
-1

Did you try the strip_tags() function?

<?php

    $s = "<ORGANIZATION>Head of Pekalongan Regency</ORGANIZATION>, Dra. Hj.. Siti Qomariyah , MA and her staff were greeted by <ORGANIZATION>Rector of IPB</ORGANIZATION> Prof. Dr. Ir. H. Herry Suhardiyanto , M.Sc. and <ORGANIZATION>officials of IPB</ORGANIZATION> in the guest room.";

    $r = strip_tags($s);

    var_dump($r);

?>

demo

MISJHA
  • 998
  • 4
  • 12