0

I need help with getting a part of an XML file with tags like this:

<SomeTag><![CDATA[TEXT I WANT HERE]]></SomeTag>

I've been playing around with RexExp for this, and can't get it right. Can you suggest the proper way please?

EDIT:
Not interested in XML parsing for this particular case.
can be anything, not just "SomeTag". Same with "Text I want here".

Thank you.

Francisc
  • 77,430
  • 63
  • 180
  • 276
  • possible duplicate of [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html) – Gordon Mar 24 '11 at 11:04
  • possible duplicate of [Best XML Parser for PHP](http://stackoverflow.com/questions/188414/best-xml-parser-for-php/3616044) – Gordon Mar 24 '11 at 11:04
  • This is not about XML parsing, that was just an example. I'm want to be better at RegExp. Not a real project. :) – Francisc Mar 24 '11 at 11:11
  • 2
    *"I need help with getting a part of an XML file with tags like this"* **is** about XML Parsing, especially since you say the tags are generic in one of the comments below. Regex can parse XML but you dont want to do that when there is parsers for that available that will do this much more reliable than your custom Regex. – Gordon Mar 24 '11 at 11:13
  • Not neccesarily in my oppinion. It may be better to deal with XML with an XML parser, but I want to know how to do this with RegExp. It's a valid question in my oppinion. – Francisc Mar 24 '11 at 11:14

4 Answers4

2

With http://regexp.zug.fr/ I write un 5 sec a very simple pattern

preg_match_all("`<!\[CDATA\[(.*?)\]\]>`U", $source, $matches);
Yoann
  • 4,937
  • 1
  • 29
  • 47
  • 1
    Another fine list of tools is here: http://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world – mario Mar 24 '11 at 11:08
  • 3
    At least match against the full cdata start and end tag. You can have a nested `]` inside the CDATA (which will break your regex), but you can't have a nested `]]>`. So change your regex to `#<\[CDATA\[(.*?)\]\]>#` if you want to use one. – ircmaxell Mar 24 '11 at 11:14
1

I suggest you use SimpleXML.

Raffael
  • 19,547
  • 15
  • 82
  • 160
  • 1
    Care to explain why instead of just throwing your opinion at the OP? – Gordon Mar 24 '11 at 11:06
  • Hi, thank you. I know about XML parsing, I am however curious about RegExp for this particular case. – Francisc Mar 24 '11 at 11:13
  • @Gordon: ... ? ... "possible duplicate of Best XML Parser for PHP – Gordon 6 mins ago" – Raffael Mar 24 '11 at 11:13
  • @Gorden: you are implicitely seconding my suggestion. – Raffael Mar 24 '11 at 11:19
  • 2
    no, I'm not. There is more answers than just the Accepted Answer in there. Also, I didnt complain about you using SimpleXML because I think it's right or wrong. I complained about you not putting an explanation for suggesting it. – Gordon Mar 24 '11 at 11:21
0

As everibody else suggested use an xml parser for this job. The code bellow will show you how to do it with regex but it's not the proper way of doing things!

$string = '<SomeTag><![CDATA[TEXT I WANT HERE]]></SomeTag>';

preg_match_all('/<sometag><\!\[CDATA\[(.*)\]\]><\/sometag>/i', $string, $matches);
var_dump($matches);
preg_match_all('/<\!\[CDATA\[(.*)\]\]>/', $string, $matches);
var_dump($matches);
Poelinca Dorin
  • 9,577
  • 2
  • 39
  • 43
0

You don't even need RegEx for this. Simple strpos is enough:

$start = strpos ($text, '<SomeTag><![CDATA[');
$end = strpos ($text, ']]></SomeTag>', $start);
return substr ($text, $start, $end - $start);
vbence
  • 20,084
  • 9
  • 69
  • 118