0

I have XML in string like this:

<a:b>
  <a:c></a:c>
  <a:c></a:c>
</a:b>

and I would like to remove all data inside namespace 'a'.

Of course this can be done this way:

/<\ba\b:.*?>.*?<\/\ba\b:.*?>/

But in this case result is:

  <a:c></a:c>
</a:b>

because the first closing tag for namespace 'a' is actually a child element. So doing it recursively would live '' inside my string.

So the idea I had was to use variable in pattern the same way as you can use it in replacement:

/<(\ba\b:.*?)>.*?<\/$1>/

This is a non working solution used just to represent the idea of realisation.

All your help, ideas, ... are very welcome. Thank you for your answer in advance.

M.V.
  • 1,662
  • 8
  • 32
  • 55
  • 4
    Don't use a regex for this, use an XML parser. – nickb Jun 08 '16 at 11:56
  • 1
    This is one of the reasons not to use regexs with HTML/XML. Use a parser, http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php. – chris85 Jun 08 '16 at 12:01
  • OK! I can do it, but the idea is not not use XML parser... – M.V. Jun 08 '16 at 12:05
  • Well that's a bad idea then :) – nickb Jun 08 '16 at 12:11
  • Not a bad idea; Just write your own XML parser... :-) – KIKO Software Jun 08 '16 at 12:12
  • OK ... So there is no way to use it the way I tried? – M.V. Jun 08 '16 at 12:14
  • Performances of XML parsers are unpredictable, it highly depends on the implementation of the library. Using regex can be a good thing when you look performance at all cost, mostly the cost of readability and maintainability. If you have on performance issue right now, it would be silly to not use a XML parser. – JesusTheHun Jun 08 '16 at 12:19

1 Answers1

0

To directly answer the question, here is a reduced working example :

(?P<nodeOpen><a:(?P<nodeName>.*)>)(?P<data>.*)(?P<nodeClose><\/a:(?P=nodeName)>)

Used in php code :

preg_match('#(?P<nodeOpen><a:(?P<nodeName>.*)>)(?P<data>.*)(?P<nodeClose><\/a:(?P=nodeName)>)#s', $xml, $matches);

/* Produce :

array(9) {
  [0]=>
  string(43) "<a:b>
  <a:c></a:c>
  <a:c></a:c>
</a:b>"
  ["nodeOpen"]=>
  string(5) "<a:b>"
  [1]=>
  string(5) "<a:b>"
  ["nodeName"]=>
  string(1) "b"
  [2]=>
  string(1) "b"
  ["data"]=>
  string(32) "
  <a:c></a:c>
  <a:c></a:c>
"
  [3]=>
  string(32) "
  <a:c></a:c>
  <a:c></a:c>
"
  ["nodeClose"]=>
  string(6) "</a:b>"
  [4]=>
  string(6) "</a:b>"
}

*/

And then concat :

$emptyNode = $matches['nodeOpen'] . $matches['nodeClose'];

Can be consulted live here : https://regex101.com/r/xX1uZ9/2

I also recommend you the awesome talk of the amazing spiderman Jordi Boggiano available on youtube ( https://www.youtube.com/watch?v=ayo8zDnd-m8 )

JesusTheHun
  • 1,217
  • 1
  • 10
  • 19
  • Thanks a lot for your answer! – M.V. Jun 09 '16 at 07:39
  • Please keep in mind this is a blasting fast solution for a hardly maintainable solution. Again, if performance is not an issue right now, you definitely should use a XML parser !!! No kidding, just do it. – JesusTheHun Jun 09 '16 at 08:33