1

I am working on a Node.js project, in this project we are searching a bunch of PHP view files, and replacing some of the attributes. I am trying to get the HTML open tag attribute values, and replace them.

Basically, if this is the tag

<tag attr1="[capture ANYTHING inside single/double qoutes]" attr2='[CAPTURE ANYTHING]'></tag>  

I want to capture anything inside the attribute quotes. and by [ANYTHING] I mean really anything!

example2: attr="with HTML <br/><b>also been captured</b>"
example3: attr="with line break style \n or \n\r this is still is part of what should been captured and this line too!"
example4: attr="a PHP code <?php echo $ThisPHPcodeisInsideTheQoutes?> should be captured as well!"
example5: title="{{angular?'if inside the attribute': 'it should be acptured as well' }}"

I had wrote the next regex:

/<\w+\s+(:?[\w-]+=(:?"|')(.|[\r\n])*?\2\s*?)>?/g

this regex is catching only the first attribute.

#regex breakdown:

< tag start
\w+ a word, mainly tag name this will force avoiding PHP tags <?php
\s+ a space or multiple sapces <tag attr
(:? a non capturing group1, I want to get Multiple attributes, but capture only the content!
[\w-]+ a word or - for example attr or ng-attr
= the attribute equal sign
(:?"|') a non capturing group2 open quote or double qoutes
(.|[\r\n])*? -- the actual data I am trying to capture, capture everything . or [\r\n] line break \2 - back reference to (:?"|') so well have "[data]" or '[data]'
\s*? - zero or more sapces before the next tag not greedy
) - close of non capturing group1
>? - end of opening tag not greedy

I don't understand why multiple attributes are not being captured Thanks in advance for the help

Thom A
  • 88,727
  • 11
  • 45
  • 75
Wazime
  • 1,423
  • 1
  • 18
  • 26
  • `(:?` is a non-capturing group? `\w` will match the `?` in `?` is a non-greedy match (hint: no, it's an optional `>`). –  Sep 01 '16 at 11:05
  • @torazaburo please run it in a regex editor, you will see that your comment is wrong , you may see it here: http://refiddle.com/refiddles/57c80c5275622d7947c11600 – Wazime Sep 01 '16 at 11:08
  • Which comment do you mean? I don't need to use a regexp editor to know that `(:?` is **not** a non-capturing group; it's a group starting with an optional `:`. You probably meant `(?:`. This could possibly be the reason for your regexp not capturing multiple attributes. –  Sep 01 '16 at 11:10
  • Where is your closing quote? What is `\2` supposed to refer to, since you're (trying) not to capture the group containing the quotes, right? –  Sep 01 '16 at 11:15
  • try this will help you : https://regex101.com/r/xA7uN8/3 – Shekhar Khairnar Sep 01 '16 at 11:16
  • @ShekharKhairnar Your solution will capture PHP tags, and there is no back reference to the open quote: https://regex101.com/r/fY1oB0/4 But thanks – Wazime Sep 01 '16 at 11:38
  • @torazaburo \2 refer to the opening quote/double You can see that it's working great in the fiddler I had published with the question. – Wazime Sep 01 '16 at 11:39
  • 1
    BY definition, a back-reference does NOT work with a non-capturing group. It works for you only because you are writing the non-capturing group INCORRECTLY as `(:?`, which, as I said an hour ago, is NOT a non-capturing group, but rather a capturing group starting with an optional colon. If you love the regexp editors so much, please review CAREFULLY their narrative description of your `(:?` construct. –  Sep 01 '16 at 12:06
  • @torazaburo thanks, fixed my query. – Wazime Sep 01 '16 at 16:32
  • Use a proper parser... – robertklep Sep 01 '16 at 18:34

1 Answers1

0

I don't see how this is possible to do with a single regex match. As far as I am aware, you cannot match multiple subpatterns using a backreference end.

Instead, I would recommend processing the HTML in two steps. First, extract the opening tag string using

/<\w+\s+[\w-]+=("|')(?:.|[\r\n])*?\1\s+.*?>/g

and then go back through the matches and extract each of the attribute/value pairs using

/([\w-]+=("|')(?:.|[\r\n])*?\2)/g

At that point, you can split on the first "=" to break apart each attribute from its value.

Here is a fiddle implementing what I recommend. Your sample text should parse out the way you want it.