38

I need help on regex or preg_match because I am not that experienced yet with regards to those so here is my problem.

I need to get the value "get me" but I think my function has an error. The number of html tags are dynamic. It can contain many nested html tag like a bold tag. Also, the "get me" value is dynamic.

<?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname>(.*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
hakre
  • 193,403
  • 52
  • 435
  • 836
marknt15
  • 5,047
  • 14
  • 59
  • 67
  • possible duplicate of [Can you provide some examples of why it is hard to parse XML and HTML with a regex?](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege) – Brad Mace Jul 09 '11 at 20:53
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Paŭlo Ebermann Sep 15 '11 at 14:17

9 Answers9

71
<?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

That should do the trick

takete.dk
  • 2,705
  • 1
  • 19
  • 12
  • 3
    The opening tag should be matched using <$tagname.*?> or <$tagname[^>]*>, not <$tagname ?.*>. As it is, it's greedy and will match a lot further than you hoped if there's more than one closing tag in the string. – Samir Talwar May 06 '09 at 10:02
  • 3
    Note that attribute values may contain a plain `>`. – Gumbo Sep 22 '09 at 05:33
  • 2
    This only works if there is only one tag of type `$tagname` on the same line. If there are multiple tags, it will grab the start to end of them both. This also won't work if the tag is spread across multiple lines. – teynon Feb 13 '13 at 04:37
15

Try this

$str = '<option value="123">abc</option>
        <option value="123">aabbcc</option>';

preg_match_all("#<option.*?>([^<]+)</option>#", $str, $foo);

print_r($foo[1]);
pkwebmarket
  • 151
  • 1
  • 2
  • 6
    yes i know but the previous answer is not 100% working correctly.yesterday i have the same issue and i try the previous answer but they show only one tag value not going to the next tag.I have correct this error and submit correct answer for new users. – pkwebmarket Jan 22 '12 at 12:45
8

In your pattern, you simply want to match all text between the two tags. Thus, you could use for example a [\w\W] to match all characters.

function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname>([\w\W]*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}
Tomas Aschan
  • 58,548
  • 56
  • 243
  • 402
3

Since attribute values may contain a plain > character, try this regular expression:

$pattern = '/<'.preg_quote($tagname, '/').'(?:[^"'>]*|"[^"]*"|\'[^\']*\')*>(.*?)<\/'.preg_quote($tagname, '/').'>/s';

But regular expressions are not suitable for parsing non-regular languages like HTML. You should better use a parser like SimpleXML or DOMDocument.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
1

this might be old but my answer might help someone

You can simply use

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
echo strip_tags($str);

https://www.php.net/manual/en/function.strip-tags.php

lawrence Da
  • 101
  • 2
  • 3
0

The following php snippets would return the text between html tags/elements.

regex : "/tagname(.*)endtag/" will return text between tags.

i.e.

$regex="/[start_tag_name](.*)[/end_tag_name]/";
$content="[start_tag_name]SOME TEXT[/end_tag_name]";
preg_replace($regex,$content); 

It will return "SOME TEXT".

Dharman
  • 30,962
  • 25
  • 85
  • 135
0
$userinput = "http://www.example.vn/";
//$url = urlencode($userinput);
$input = @file_get_contents($userinput) or die("Could not access file: $userinput");
$regexp = "<tagname\s[^>]*>(.*)<\/tagname>";
//==Example:
//$regexp = "<div\s[^>]*>(.*)<\/div>";

if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
    foreach($matches as $match) {
        // $match[2] = link address 
        // $match[3] = link text
    }
}
Xman Classical
  • 5,179
  • 1
  • 26
  • 26
0

try $pattern = "<($tagname)\b.*?>(.*?)</\1>" and return $matches[2]

Darren Li
  • 381
  • 1
  • 3
  • 6
  • Thank you for posting an answer to this question! Code-only answers are discouraged on Stack Overflow, because a code dump with no context doesn't explain how or why the solution will work, making it difficult for the original poster (or any future readers) to understand the logic behind it. Please, edit your question and include an explanation of your code so that others can benefit from your answer. Thanks! – Maximillian Laumeister Aug 06 '15 at 22:38
0

Your HTML

$html='<ul id="main">
    <li>
        <h1><a href="[link]">My Title</a></h1>
        <span class="date">Date</span>
        <div class="section">
            [content]
        </div>
    </li>
</ul>';

//function call you can change the tag name

echo contentBetweenTags($html,"span");

// this function will help you to fetch the data from a specific tag

function contentBetweenTags($content, $tagname){
    $pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
    preg_match($pattern, $content, $matches);
    
    if(empty($matches))
        return;
    
    $str = "<$tagname>".html_entity_decode($matches[1])."</$tagname>";
    return $str;
}
vaibhav kulkarni
  • 1,733
  • 14
  • 20