0

i am trying to match some xml tag with regular expression here is my php code

   $pattern = '#<xt:tag_name *(type\="(.+?)")? *(detail\="(.+?)")? ?/>#ui';
   $str = '<xt:tag_name type="1" detail="2" />';
   preg_replace($pattern,"type: $1, detail: $4",$str);
   preg_match($pattern,$str,$m);
   print_r($m);

and i am getting expected result

Array
(
    [0] => <xt:tag_name type="1" detail="2" />
    [1] => type="1"
    [2] => 1
    [3] => detail="2"
    [4] => 2
)

but when i am changing the order of attributes

<xt:tag_name detail="2" type="1" />

matches gets failed

HamZa
  • 14,671
  • 11
  • 54
  • 75
Shushant
  • 1,625
  • 1
  • 13
  • 23
  • 1
    Did you read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 ? – sectus Jun 18 '13 at 12:50
  • Suely your pattern shows that `type` occurs before `detail` so if you swap them around, you shouldn't expect the string to match? – nurdglaw Jun 18 '13 at 12:50
  • 1
    **1)** You don't need to escape `=` **2)** the preg_replace line is useless, remove it **3)** use `|` to alternate patterns. – HamZa Jun 18 '13 at 12:51
  • 1
    why are you parsing xml with regular expressions? If you need something that's not memory intensive, `XMLReader` may be a good option. – Evert Jun 18 '13 at 12:52
  • i am writing some complex application ex. will be replaced with all file and folder in folder_name – Shushant Jun 18 '13 at 12:56
  • Well, complex applications can use a proper xml parser even more ;) Trying to parse xml with regex is just a really bad idea. – Evert Jun 18 '13 at 12:58

1 Answers1

3

Description

This regex will capture the attributes type and detail regardless of the attribute order, providing they are inside the xt:tag_name tag.

<xt:tag_name\b(?=\s)(?=(?:(?!\>).)*\s\btype=(["'])((?:(?!\1).)*)\1)(?=(?:(?!\>).)*\s\bdetail=(["'])((?:(?!\3).)*)\3)(?:(?!\>).)*\>

enter image description here

Expanded Description

  • <xt:tag_name\b validates the tag name
  • (?=\s) ensures there is a space after tag name
  • (?= lookahead 1 for the type. By using a lookahead you can capture the attributes in any order.
    • (?:(?!\>).)* move through tag one character at a time and prevent the regex engine from exiting this tag until you reach
    • \s\btype= the attribute type
    • (["']) capture the open quote, this will be used later to match the proper close tag
    • ((?:(?!\1).)*) capture all characters inside the quotes, but not including the same type of encapsulating quote
    • \1 match the close quote
    • ) close the lookahead for type
  • (?=(?:(?!\>).)*\s\bdetail=(["'])((?:(?!\3).)*)\3) does the exact same thing for attribute named detail as was done for type
  • (?:(?!\>).)* match all characters until
  • \> the end of the tag

Groups

Group 0 will have the entire tag from the open to close bracket

  1. will have the open quote around the type value, this allows the regex to correctly match the close quote
  2. will have the value from attribute type
  3. will have the open quote around the detail value, this allows the regex to correctly match the close quote
  4. will have the value from attribute detail

PHP Code Example:

Input string

<xt:tag_name UselessAttribute="some dumb string" type="1" detail="2" /><xt:tag_name detail="Things 'Punk' Loves" MoreUselessAttributes="1231" type="kittens" />

Code

<?php
$sourcestring="your source string";
preg_match_all('/<xt:tag_name\b(?=\s)(?=(?:(?!\>).)*\s\btype=(["\'])((?:(?!\1).)*)\1)(?=(?:(?!\>).)*\s\bdetail=(["\'])((?:(?!\3).)*)\3)(?:(?!\>).)*\>/ims',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

Matches

$matches Array:
(
[0] => Array
    (
        [0] => <xt:tag_name UselessAttribute="some dumb string" type="1" detail="2" />
        [1] => <xt:tag_name detail="Things 'Punk' Loves" MoreUselessAttributes="1231" type="kittens" />
    )

[1] => Array
    (
        [0] => "
        [1] => "
    )

[2] => Array
    (
        [0] => 1
        [1] => kittens
    )

[3] => Array
    (
        [0] => "
        [1] => "
    )

[4] => Array
    (
        [0] => 2
        [1] => Things 'Punk' Loves
    )
)
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • not the first time I saw you put regex images. What software are you using? – Sebastian Jun 19 '13 at 00:52
  • 1
    Hey Sebastian, I'm using debuggex.com. Although it doesn't support lookbehinds or atomic groups it's still handy for understanding the expression flow. There is also regexper.com. They do a pretty good job too, but it's not real time as you're typing. – Ro Yo Mi Jun 19 '13 at 01:13