0

I want to catch "(<tag>(.*?)</tag>)", but only the latest child, so for example <tag><tag>...</tag></tag> should be ignored.

Is this possible to do using only one regexp in PHP?

Code:

------------------------

<tag>
    <tag>
        <tag>
            <i> abcd <\/i>
        </tag>
    </tag>
</tag>

<tag>
    <tag>
        efgh
    </tag>
</tag>

------------------------

<tag>
    ijkl
</tag>

<tag>
    mnop
</tag>

------------------------
halfer
  • 19,824
  • 17
  • 99
  • 186
user3383675
  • 1,041
  • 5
  • 12
  • 31
  • 3
    [Don't use regexes for parsing HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – John Conde Aug 07 '14 at 13:36
  • 1
    I think that using xquery is a better option for this – Federico Piazza Aug 07 '14 at 13:37
  • I know that it is not the best method, but I must to break this rule. :P – user3383675 Aug 07 '14 at 13:39
  • If you need to break a rule, it is helpful to explain why, so readers can have a better idea of how to advise. – halfer Aug 07 '14 at 13:47
  • because very old program can be fed only by one regexp string as argument. The program is doing too many things in black box, so i can not simply rewrite source or write new program. Program usually was fed by text files without html or any other tagged language, but very rarely, for example - now, i want to give him one file written in tagged language ;P – user3383675 Aug 07 '14 at 13:57

1 Answers1

0

You can use this regex and catch the capturing group:

.*<\/?tag>|(.*<\/.*?>)$

Working demo

enter image description here

Btw, if you want to get the full line with whatever content inside tags (like efg and mnop) you could use this one:

.*<\/?tag>|(.+)$

Working demo

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123