This is a question for the regex gurus.
If I have a series of xml nodes, I would like to parse out (using regex) the contained node values that exist on the same level as my current node. For instance, if I have:
<top-node>
Hi
<second-node>
Hello
<inner-node>
</inner-node>
</second-node>
Hey
<third-node>
Foo
</third-node>
Bar
<top-node>
I would like to retrieve an array that is:
array(
1 => 'Hi',
2 => 'Hey',
3 => 'Bar'
)
I know I can start with
$inside = preg_match('~<(\S+).*?>(?P<inside>(.|\s)*)</\1>~', $original_text);
and that will retrieve the text sans the top-node
.
However, the next step is a bit beyond my regex abilities.
EDIT: Actually, that preg_match appears only to work if the $original_text
is all on the same line. Additionally, I think I can use a preg_split
with a very similar regex to retrieve what I am looking for- it just isn't working across multiple lines.
NOTE: I appreciate and will oblige any requests for clarification; however, my question is pretty specific and I mean what I am asking, so don't give an answer like "go use SimpleXML" or something. Thank you for any and all assistance.