Using PHP to remove a html element from a string

Question

I am having trouble working out how to do this, I have a string looks something like this...

    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";

I basically want to use something like preg_repalce and regex to remove

<em>This is some example text This is some example text This is some example text</em>

So I need to write some PHP code that will search for the opening  and closing  and delete all text in-between

hope someone can help, Thanks.

yeah the em is always there, and yes i will end up with an empty
but thats not an issue — lukehillonline, Sep 23 '11 at 13:39

score 4 · Answer 1 · answered Sep 23 '11 at 13:36

4

$text = preg_replace('/([\s\S]*)(<em>)([\s\S]*)(</em>)([\s\S]*)/', '$1$5', $text);

answered Sep 23 '11 at 13:36

arychj

711
3
21

this is along the lines of what I am looking for but I get this error Warning: preg_replace() [function.preg-replace]: Unknown modifier '>' – lukehillonline Sep 23 '11 at 13:43
Sorry, I forgot to escape the close slash in the closing () group. () should be: (<\/em>) – arychj Sep 23 '11 at 13:47
gave it a go, it does nothing i am afraid – lukehillonline Sep 23 '11 at 13:49
What do you mean it does nothing? I get an output of: "
This is some example text This is some example text This is some example text

This is some example text This is some example text This is some example text
" – arychj Sep 23 '11 at 13:51
You do realize that (<\/em>) does not have a capital V in it right? it's a backslash '\' and then a forwardslash '/'... – arychj Sep 23 '11 at 13:52
i tried it m8 and it did nothing, i got out exactly what i put in - and yes i realise that thank you very much! – lukehillonline Sep 23 '11 at 13:53

Kalle H. Väravas · Answer 2 · 2011-09-23T13:47:26.403

2

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

preg_match("#<em>(.+?)</em>#", $text, $output);

echo $output[0]; // This will output it with em style
echo '<br /><br />';
echo $output[1]; // This will output only the text between the em

^{[ View output ]}

For this example to work, I changed the  contents a little, otherwise all your text is the same and you cannot really understand if the script works.

However, if you want to get rid of the  and not to get the contents:

$text = '<p>This is some example text This is some example text This is some example text</p>
<p><em>This is the em text</em></p>
<p>This is some example text This is some example text This is some example text</p>';

echo preg_replace("/<em>(.+)<\/em>/", "", $text);

^{[ View output ]}

edited Sep 23 '11 at 13:47

answered Sep 23 '11 at 13:34

Kalle H. Väravas

3,579
4
30
47

**Note:** This works on the assumption that there is only one `` inside your string. – Kalle H. Väravas Sep 23 '11 at 13:40
i see so this text strips out the text but what does it leave behind, the actual text i am not interested in, I want to delete it from the string and be left with the rest of the text – lukehillonline Sep 23 '11 at 13:42
@AdriftUniform, my bad, I got your question a bit wrong. See the edit, it should what you asked. – Kalle H. Väravas Sep 23 '11 at 13:48
Beware if you have stuff like multiline HTML. .+ doesn't match across newlines by default. It took me about an hour to finally discover [PCRE_DOTALL](http://www.php.net/manual/en/regexp.reference.dot.php) and the /s modifier. – lkraav Mar 29 '12 at 22:58
Very effective...was going to use php html dom class but this is far simpler and what I needed was even able to target the element by id...example: `echo preg_replace('/(.+)<\/em>/', "", $text);` – greaterKing May 26 '17 at 16:03

Nertim · Accepted Answer · 2011-09-23T13:50:50.940

2

In case if you are interested in a non-regex solution following would aswell:

<?php
    $text = "<p>This is some example text This is some example text This is some example text</p>
             <p><em>This is some example text This is some example text This is some example text</em></p>
             <p>This is some example text This is some example text This is some example text</p>";


    $emStartPos = strpos($text,"<em>");
    $emEndPos = strpos($text,"</em>");

    if ($emStartPos && $emEndPos) {
        $emEndPos += 5; //remove <em> tag aswell
        $len = $emEndPos - $emStartPos;

        $text = substr_replace($text, '', $emStartPos, $len);
    }

?>

This will remove all the content in between tags.

edited Sep 23 '11 at 13:50

answered Sep 23 '11 at 13:42

Nertim

380
6
15

nice and if I added on to this a bit and put something like, preg_replce("", " ", $text) and preg_replce("", " ", $text) then that would get rid of the tags aswell? – lukehillonline Sep 23 '11 at 13:45
if you dont want to preserve the tags, instead of $emStartPos += 4 do $emEndPos += 5 ('' is 5 chars long) – Nertim Sep 23 '11 at 13:48
2

I cant believe, you picked this answer. There is so much code, its not neat, its not optimal.. – Kalle H. Väravas Sep 23 '11 at 13:53
@KalleH.Väravas I think AdriftUniform decided to use this, because its easier to read compared to regex, specially if one is not well versed with regex. I agree with you that regex does solve this in one line of code. In background, the interpreter still has to analyze the regular expression and than perform action on the text, so with that I am not sure if in this particular case the regular expression would be more optimized? Perhaps AdriftUniform can run timing tests on each solution and use the one that is more efficient, specially if he/she is planning on processing many text blocks. – Nertim Sep 23 '11 at 15:05

score 1 · Answer 4 · answered Sep 23 '11 at 13:41

1

Use strrpos to find the first element and then the last element. Use substr to get the part of string. And then replace the substring with empty string from original string.

answered Sep 23 '11 at 13:41

marko

10,684
17
71
92

Why make it so complex, if you can do the hole thing with one function, one line, one match?! – Kalle H. Väravas Sep 23 '11 at 13:52
@Kalle Regular Expressions are complex too. They just can be written in a very terse way. But the interpreter needs to parse and translate them. You just dont see the complexity because it happens behind the scenes. – Gordon Sep 23 '11 at 14:14
HTML can't be parsed with regex. What about comments or quoted strings that include the characters ? Or ………. – Peter Wooster Dec 15 '13 at 03:30

Vikram Srivastava · Answer 5 · 2011-09-23T13:47:07.090

-4

  format: $text = str_replace('<em>','',$text);
$text = str_replace('</em>','',$text);

edited Sep 23 '11 at 13:47

answered Sep 23 '11 at 13:33

Vikram Srivastava

39
6

OP does not want to strip all tags, but just the `` tags and content. – Gordon Sep 23 '11 at 13:38
as said above this is not what i am looking for, i only want to remove the and i want to get rid of the text inbetween which strip_tags will not do – lukehillonline Sep 23 '11 at 13:40

Using PHP to remove a html element from a string

5 Answers5

Linked