PHP Regular expression tag matching

Question

Been beating my head against a wall trying to get this to work - help from any regex gurus would be greatly appreciated!

The text that has to be matched

[template option="whatever"] 

<p>any amount of html would go here</p>

[/template]

I need to pull the 'option' value (i.e. 'whatever') and the html between the template tags.

So far I have:

> /\[template\s*option=["\']([^"\']+)["\']\]((?!\[\/template\]))/

Which gets me everything except the html between the template tags.

Any ideas?

Thanks, Chris

What happens if `
this is how you break a parser: [/template] It's broken now!
` is the html? — ircmaxell, Jan 23 '11 at 03:53
@user551841 he did, it's PHP ... @Chris wow I thought I knew regexes but I don't get the middle part `"\'["\']]((?!` at all! Or is the PHP syntax that special? — Felix Dombek, Jan 23 '11 at 04:02
I suspect that you forgot to escape brackets. Remember - they have special meaning in regex? — ulidtko, Jan 23 '11 at 04:05
obligatory link to reasons not to use regular expressions to parse a non-regular language: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Mark Elliot, Jan 23 '11 at 04:05
Oh yes, answering such a question just cost me 2 rep from people who think so. If someone has an answer, I recommend to just put it in a comment — Felix Dombek, Jan 23 '11 at 04:08
ah.. I was using a blockquote instead of pre to display the regex and it was removing some of the characters - sorry! — Chris, Jan 23 '11 at 04:11
Mark E: I'm not really trying to parse the html - just to identify it! The content between the [template][/template] tag could be anything... — Chris, Jan 23 '11 at 04:14
well, your second parenthesis group includes the `[/template]` tag, but otherwise you should be able to access the contents of the parens by number! For the HTML, you can simply try a "reluctant" `.*` (probably `.*?` but I'm not familiar with PHP). Also be aware, of course, that your `option` value should not be empty or contain escaped `"` chars, otherwise this will not work ... — Felix Dombek, Jan 23 '11 at 04:16

score 1 · Answer 1 · answered Jan 23 '11 at 04:18

1

Try this

/\[template\s*option=\"(.*)\"\](.*)\[\/template]/

basically instead of using complex regex to match every single thing just use (.*) which means all since you want everything in between its not like you want to verify the data in between

answered Jan 23 '11 at 04:18

bhappy

62
3
17

Yes I tried this but it doesn't work I presume because '.' is any character except a new line.. which the content may have. Replacing (.*) with ([.\n]) didn't work either. – Chris Jan 23 '11 at 04:22
@Chris, there's a [multi-line modifier in PHP's regex](http://us.php.net/manual/en/reference.pcre.pattern.modifiers.php), in this case you'd follow the expression with an `m`. – Mark Elliot Jan 23 '11 at 04:32
@Mark E, great thanks for the tip and that brilliant answer to parsing html with regex is going on my wall to cheer me up first thing on a monday morning! – Chris Jan 23 '11 at 04:41

amcashcow · Accepted Answer · 2011-01-23T04:28:55.653

1

edit: [\s\S] will match anything that is space or not space.

you may have a problem when there are consecutive blocks in a large string. in that case you will need to make a more specific quantifier - either non greedy (+?) or specify range {1,200} or make the [\s\S] more specific

/\[template\s*option=["\']([^"\']+)["\']\]([\s\S]+)\[\/template\]/

edited Jan 23 '11 at 04:28

answered Jan 23 '11 at 04:23

amcashcow

724
1
6
16

Good work! Yes that works. Well done. Thanks everyone else as well – Chris Jan 23 '11 at 04:29

score 0 · Answer 3 · answered Jan 23 '11 at 04:19

0

The assertion ?! method is unneeded. Just match with .*? to get the minimum giblets.

/\[template\s*option=\pP([\h\w]+)\pP\]  (.*?)  [\/template\]/x

answered Jan 23 '11 at 04:19

mario

144,265
20
237
291

score 0 · Answer 4 · answered Jan 23 '11 at 04:44

Chris,

I see you've already accepted an answer. Great!

However, I don't think use of regular expressions is the right solution here. I think you can get the same effect by using string manipulations (substrings, etc)

Here is some code that may help you. If not now, maybe later in your coding endeavors.

<?php

    $string = '[template option="whatever"]<p>any amount of html would go here</p>[/template]';

    $extractoptionline = strstr($string, 'option=');
    $chopoff = substr($extractoptionline,8);
    $option = substr($chopoff, 0, strpos($chopoff, '"]'));

    echo "option: $option<br \>\n";

    $extracthtmlpart = strstr($string, '"]');
    $chopoffneedle = substr($extracthtmlpart,2);
    $html = substr($chopoffneedle, 0, strpos($chopoffneedle, '[/'));

    echo "html: $html<br \>\n";

?>

Hope this helps anyone looking for a similar answer with a different flavor.

can I ask why don't you think use of regular expressions is the right solution here? What is the disadvantage of using a regular expression? — Chris, Jan 23 '11 at 04:52
@Chris: For your purposes, and because you have a valid solution now, you can use regex. However, in general, I use regular expressions when I want to find some text in a document for which I cannot control (or do not know) the formatting. If the format of the document is more or less statically known, or has a particular structure, you can use string manipulation functions like I did here. Notice that the functions implicitly do the same thing as regex (find `[/` etc...). — aqua, Jan 23 '11 at 05:46

PHP Regular expression tag matching

4 Answers4