-1

I have some HTML content and I would like to replace a tag:

<span class='c1'>MY TEXT</span>

And keep MY TEXT. I tried with:

$result = preg_replace('/(<span class=\'c1\'>)(.*)(<\/span>)/', '$2', $my_string);

But the closed tag still remains? Can you help me and EXPLAIN where is my mistake? I would like to improve myself! Thank you

mck89
  • 18,918
  • 16
  • 89
  • 106
Yoong Kim
  • 310
  • 1
  • 6
  • 13
  • I guess .* matches closing tags too... – Carlo Moretti Jul 23 '12 at 07:47
  • It probably matches on the closing tag too with `(.*)`, as a test you could do something like `([^<]*)`. If this works, you could use lookaheads to get the result. – javex Jul 23 '12 at 07:47
  • Your code works fine for me, I don't see what the problem is, but just as a note, some of the parentheses are not needed, e.g. you could use this: `preg_replace('/(.*)<\/span>/', '$1', $my_string);` – uınbɐɥs Jul 23 '12 at 07:49
  • 1
    http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html .... – KingCrunch Jul 23 '12 at 07:55
  • 1
    Obligatory: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – F.P Jul 23 '12 at 07:56
  • Or you just add "U" tag to the regex. (Ungreedy mode) – Jerska Jul 31 '12 at 13:50

5 Answers5

2

Try using a lazy match (.*?) instead of a greedy match (.*).

Greedy match means it will match as much as possible before finishing, so if you have another </span> somewhere, it will match that instead. For example:

Using a greedy match:

<span class='c1'>MY TEXT</span><span class='c1'>MY OTHER TEXT</span>
                 ^--greedy match will go from here to here--^

Using a lazy match:

<span class='c1'>MY TEXT</span><span class='c1'>MY OTHER TEXT</span>
                 ^-lazy^                        ^---lazy----^
Jon Newmuis
  • 25,722
  • 2
  • 45
  • 57
0

Try it

$result = preg_replace('/(\<span class=(\'|\")?c1(\'|\")?\>)(.*)(\<\/span\>)/i', '$4', $my_string);
ke20
  • 665
  • 4
  • 19
0

Your match-all group probably consumes too much and you see another </span> from the trailing content. You should try

$result = preg_replace('/(<span class=\'c1\'>)(.*?)(<\/span>)/', '$2', $my_string);

which uses an ungreedy match-all (.*?).

Dio F
  • 2,458
  • 1
  • 22
  • 39
0

Use Simple HTML DOM, it's the best solution for manipulating HTML elements.

Example:

require_once('simple_html_dom.php');
$html = str_get_html('<span class="c1">MY TEXT</span>');
$text = $html->plaintext;

Or if you have an entire HTML document (rather than just a snippet of HTML):

require_once('simple_html_dom.php');
$html = str_get_html('html goes here');
$text = $html->find('span.c1', 0)->plaintext; // Find text from first <span> with the class 'c1'

It's as simple as that.

uınbɐɥs
  • 7,236
  • 5
  • 26
  • 42
0

This will give you perfect result:

preg_match_all('/<span(.*?)class=\'c1\'>(.*?)<\/span>/', '$2', $my_string, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
   echo $result[0][$i];
}
Druid
  • 6,423
  • 4
  • 41
  • 56
Indian
  • 645
  • 7
  • 22