PHP ->preg_match_all for following structure
my headline
some text ...
another headline
more text

Question

I'm desperate looking for the solution to get this text string

<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ...

parsed into an PHP array.

I need to seperate it to

1.
1.0=> First pane
1.1=> ... pane content ... 

2.
2.0=> Second pane
2.1=> Hi, this is a comment.
    To delete a comment, just log in and view the post's comments.
    There you will have the option to edit
    or delete them.

3.
3.0=> Last pane
3.1=> ... last pane content ...

*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, Dec 03 '10 at 16:08

score 1 · Answer 1 · answered Dec 03 '10 at 16:23

1

Your regex should look like this:

/<h6>([^<]+)<\/h6>([^<]+)/im

If you run the following script, you'll see that the values you're looking for are in $matches[1] and $matches[2].

$s = "<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ..";
$r = "/<h6>([^<]+)<\/h6>([^<]+)/im";

$matches = array();
preg_match_all($r,$s,$matches);

print_r($matches);

answered Dec 03 '10 at 16:23

01001111

838
5
5

Thanks. That almost works. I'm just missing the actual content in my example i named it 1.1, 2.1 and 3.1. Any idea how i can get that...? – chris Dec 03 '10 at 16:31
Hi, i'm sorry your code works. I copied the content of your variable $s and it worked.... Unfortunaly i checked the input of my $s and it looks like this
First pane

… pane content …

Second pane

Hi, this is a comment.
To delete a comment, just log in and view the post’s comments.
There you will have the option to edit
or delete them.

Last pane

… last pane content …
Any idea how you could get this working?
– chris Dec 03 '10 at 16:42
Anything more complex than your initial result and you really shouldn't be using a regex and instead use a DOM parser. – 01001111 Dec 03 '10 at 17:33

Richard H · Answer 2 · 2010-12-03T16:30:34.250

1

You shouldn't be attempting to parse HTML with a regex. This is doomed to cause much pain and unhappiness for all but the very simplest HTML, and will instantly break if anything in your doc structure changes. Use a proper HTML or DOM parser instead, such as php's DOMDocument http://php.net/manual/en/class.domdocument.php

For example you can use getElementsByTagName http://www.php.net/manual/en/domdocument.getelementsbytagname.php to get all h6's

edited Dec 03 '10 at 16:30

answered Dec 03 '10 at 16:24

Richard H

38,037
37
111
138

Thanks for the tip. I have used that class for more advanced stuff, but in this cas i really just need to parse the example above... and if it only would be the
i wanted to have, than that would be pretty easy, i'm just failing to get both the headline and the content below ..
– chris Dec 03 '10 at 16:32
I'd still advise using that. It's not much more code to write, and I haven't tested the regex posted by 01001111, but it looks like it will break if you have a "<" in your text. – Richard H Dec 03 '10 at 16:41
Do you have an idea of how i select the text that class? To select the h6 is pretty easy but i although need the text which is not wrapped into any tag. It's like in my example above headline-text/headline-text ... – chris Dec 03 '10 at 17:17

Alan Moore · Answer 3 · 2010-12-03T17:39:19.940

I believe the PREG_SET_ORDER flag is what you're looking for.

$regex = '~<h6>([^<]+)</h6>\s*([^<]+)~i';

preg_match_all($regex, $source, $matches, PREG_SET_ORDER);

This way, each element in the $matches array is an array containing the overall match followed by all of the group captures for a single match attempt. The result up to the first match looks like this:

Array
(
    [0] => Array
        (
            [0] => First pane
... pane content ...

            [1] => First pane
            [2] => ... pane content ...

        )

see it in action on ideone

EDIT: Notice the \s* I added, too. Without that, the matched content always starts without a line separator.

PHP ->preg_match_all for following structure my headlinesome text ... another headline more text

my headline

another headline

3 Answers3

First pane

Second pane

Last pane

i wanted to have, than that would be pretty easy, i'm just failing to get both the headline and the content below ..

PHP ->preg_match_all for following structure
my headline
some text ...
another headline
more text