2

I want to delete <h1>xxx yyyy zzz </h1> with php. But, first, I want to control if string starts with <h1> and ends with </h1>

Is there a function for this purspose?

if(string begins with '<h1>' and ends with '</h1>'){

    replace `<h1>`xxx yyyy zzz `</h1>` to 'NULL or empty space'

}
Anonymous
  • 11,748
  • 6
  • 35
  • 57
hakki
  • 6,181
  • 6
  • 62
  • 106

3 Answers3

4

What about just using a regular expression?

$string = preg_replace( "/<h1>(.*?)<\\/h1>/", "", $string );

The *? is to make it non-greedy

redolent
  • 4,159
  • 5
  • 37
  • 47
  • Is that a typo? should be `<\\/h1>` – mehmetseckin Feb 13 '14 at 21:03
  • 1
    Oops. Thanks for catching that. – redolent Feb 13 '14 at 21:03
  • 3
    This will also remove `

    skjgkdjjfkkjdfjdf

    jdfjdfdfjdf

    jfjdfjfd

    `.
    – Amal Murali Feb 13 '14 at 21:04
  • @amal-murali, Having the `*?` will prevent that issue – redolent Feb 13 '14 at 21:07
  • 3
    Regex is not a tool that can be used to correctly parse HTML. regex-infection wil​l devour your HT​ML parser,ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ See http://stackoverflow.com/a/1732454/325521 – Shiva Feb 13 '14 at 21:09
  • Why do people try to manipulate HTML with Regex? It's insane. Use a parser such as [Simple HTML DOM](http://simplehtmldom.sourceforge.net/) if you are using PHP – CommandZ Feb 17 '14 at 03:46
  • For small tasks, regex is more lightweight and possibly easier to read. – redolent Feb 17 '14 at 19:30
  • @Amal this won't remove because the `*?` operator is non-greedy – redolent Jul 29 '16 at 18:03
2

Regular expressions are not the right tool for this job. Use a DOM parser to parse HTML. Here's a solution using the built-in DOMDocument class.

$dom = new DOMDocument;
$your_html_string = '<h1>xxx yyyy zzz </h1>';
$dom->loadHTML($your_html_string);

$h1_tags = $dom->getElementsByTagname('h1');

// array of elements that are to be removed
$remove = array();
foreach ($h1_tags as $tag) {
    $remove[] = $tag;
}

// remove them
foreach($h1_tags as $tag) {
    $tag->parentNode->removeChild($tag);
}

// remove the DOCTYPE/html/body tags that DOM adds by default
$html = preg_replace(
    '~<(?:!DOCTYPE|/?(?:html|head|body))[^>]*>\s*~i', '', $dom->saveHTML()
);

echo $html;

Demo

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
0
<?php
    $string = '<h1>AbcXycKasOkasdMpal</h1>';
    $pattern = '/<h1>.*<\/h1>/i';
    $replacement = '';
    echo preg_replace($pattern, $replacement, $string);
?>

By using regular expressions and the PHP preg_replace function you can pinpoint and replace every string starting with <h1> and ending </h1> with a blank string.

If you don't want to replace, look into preg_match.


Edit: changed to fix what @SyntaxLAMP pointed out.

Omar Himada
  • 2,540
  • 1
  • 14
  • 31