-2
$test = array('<h1>text1</h1>','<h1><a href="#">text2</a><h1>','<h1>text3</h1><p>subtext3</p>');

In a long long texts, I use preg_split cut them into small pieces. I want to remove only h1 tag wraped and without hyperlink.

I hope remove all the text looks like: <h1>text1</h1> //only h1 wraped and without hyperlink.

And remain <h1><a href="#">text2</a><h1>,<h1>text3</h1><p>subtext3</p>

fish man
  • 2,666
  • 21
  • 54
  • 94
  • Please provide some example or try to be more clear because it's hard to tell what you want to do. – Lukas Knuth Aug 01 '11 at 01:00
  • @Lukas Knuth, I will put `array` in a foreach, and I want to remove a short text, only contains `h1` and `text` witch without `a, div, p, span`. all the `h1` has other html tag will print out from the foreach. thanks. – fish man Aug 01 '11 at 01:08
  • I don't get it, please provide an example (edit your question). – Lukas Knuth Aug 01 '11 at 01:15
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Aug 01 '11 at 06:52

1 Answers1

1

Use a loop to go through each array element and find each instance of the string "<". Then look at the next 3 characters. If they're "h1>" then you you have the correct tag. If you ever find a "<" that has a different 3 characters, then its not an "" HTML tag and you can remove this array object.

To remove the given object from the array, you can use unset($array[$index]) and when you're done I recommend using a sort to remove any index skips that may occur.

You'll want to use functions such as strpos to get the position of a string, and substr to get a subset of the given string. php.net is your friend :)

Here is an example function which works with your $test array:

<?php
$test = array('<h1>text1</h1>','<h1><a href="#">text2</a><h1>','<h1>text3</h1><p>subtext3</p>');
function removeBadElements(&$array) {
    foreach($array as $k => $v) {
        // $v is a single array element
        $offset = 0;
        do {
            $pos = strpos($v, '<', $offset);
            $offset = $pos + 1;
            if($pos === false) { break; }

            $tag = substr($v, $pos, 3);
            $next = substr($v, $pos+1, 1);
            if($next == '/') { continue; }
            if($tag == '<h1') { continue; }
            else {
                unset($array[$k]);
                break;
            }
        } while($offset + 2 < strlen($v));
    }
}
echo "\nORIG ARRAY:\n";
print_r($test);
removeBadElements($test);
echo "\n\n-------\nMODIFIED ARRAY:\n\n";
print_r($test);
?>
smdvlpr
  • 1,088
  • 9
  • 22
  • maybe this is hard for me to write a complex `function`, so would u like to help me? Thanks. – fish man Aug 01 '11 at 01:02
  • `

    blah

    ` is still an h1 tag. You're better of using an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php)
    – Peter Ajtai Aug 01 '11 at 01:08
  • @Peter Ajtai, in my text, all the h1 is clear with no `id,class,style`. – fish man Aug 01 '11 at 01:11
  • @Ryan - if you know exactly what you're h1 tags look like, you could try a regex like /

    .*\h1>/ with pre_match => http://codepad.viper-7.com/OQwC5z ------ though that would fail if you have two H1s

    – Peter Ajtai Aug 01 '11 at 01:20
  • Well I'm assuming they're lowercase and always start like

    – smdvlpr Aug 01 '11 at 01:21
  • @fish man, I added 2 print_r statements to my code, and I'm not using strpos instead of stripos, incase your PHP version is older and unsupportive. Regardless, this answer still works – smdvlpr Aug 01 '11 at 16:21
  • Thanks, I solved this question based your answer at morning, so I closed this questions. – fish man Aug 01 '11 at 19:49