0

Can anyone look at my regex in javascript and suggest a correct one?

I'm trying to select attributes(name/value) pairs in an HTML/XML string like following?

<unknowncustom:tag attrib1="XX' XX'" attrib2='YY" YY"' attrib3=ZZ""'>/unknowncustom:tag>

SOME TEXT that is not part of any tag and should not be selected, name='XX', y='ee';

<custom:tag attrib1="XX' XX'" attrib2='YY" YY"' attrib3=ZZ""'>/custom:tag>

I found many solutions but none seem foolproof (including this one Regular expression for extracting tag attributes)

My current regex selects the first attribute pair but can't figure out how to make it select all matching attributes. Here is the regex:

/<\w*:?\w*\s+(?:((\w*)\s*=\s*((?:(?:"[^"]*")|(?:'[^']*')|[^>\s]+))))[^>]*>/gim

Thanks

Community
  • 1
  • 1
Vishal Seth
  • 4,948
  • 7
  • 26
  • 28
  • expected output ? , I dont understand english – aelor May 21 '14 at 04:38
  • 5
    [**No**](http://stackoverflow.com/a/1732454/497418). Don't do that. [Regular Expressions are the wrong solution](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Use a document fragment and let the browser use it's native parsing functionality. – zzzzBov May 21 '14 at 04:40
  • expected output is to be able to loop through all attributes and value pairs. To the guy who suggests let browser use its parsing functionality, that just does not serve my purpose. If not regex then I might have to write a JS function to process it. The problem with using browser's parsing is that when I try to get back my original content's source its custom attributes are not returned back. – Vishal Seth May 21 '14 at 04:47
  • Working on it but a few questions: 1. is this `attrib3=ZZ""'` correct? (quotes in the right place?) 2. do you always have 3 attributes? 3. is the `tag:` string important, or can we assume that any key-value pair in a tag is fine? – zx81 May 21 '14 at 06:04
  • there can be infinite attributes. As for the last attribute example, objective is to capture anything until we hit a space if its not covered in double or single quotes. re: ":tag", yes, some elements can be part of namespaces. Thanks – Vishal Seth May 21 '14 at 14:50

1 Answers1

3

Let's have a go:

/(\w+)\s*=\s*((["'])(.*?)\3|([^>\s]*)(?=\s|\/>))(?=[^<]*>)/g

Regex is not ideal for this. If your attributes contain unescaped angle brackets < > it probably will not work.

Proof: http://regex101.com/r/dD4uT4

Lee Kowalkowski
  • 11,591
  • 3
  • 40
  • 46