-1

I want to match HTML tags and its containing attributes. Tried the following regex:

/<(\w+)(?: +(\w+)="[\w,;.:\-#'+~*?=&%\$!\/'\]\[@\(\)\s]*")*/gm

On that input:

<p><li first="1" second="2" third="3"></li><b><br/><p><li first="1" second="2" third="3"></li><b><br/></p>
<p><li first="1" second="2"></li><b><br/><p><li first="1" second="2"></li><b><br/></p>
<p><li first="1"></li><b><br/><p><li first="1"></li><b><br/></p>

I only get one attribute. If there are more than one attributes in a tag, I always get the last one. First row returns third, second row returns second and last row returns first for group 2.

The result is for line number one is:

p li third b br p li third b br

But should be:

p li first second third b br p li first second third b br

How do I get all attributes to a tag?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Chris
  • 310
  • 4
  • 14
  • 1
    Why not use something like the [HTML agility pack?](https://html-agility-pack.net/) – emsimpson92 Dec 18 '18 at 20:21
  • 2
    [Have you tried using an XML parser instead?](https://stackoverflow.com/a/1732454) – Ivar Dec 18 '18 at 20:21
  • I'm not allowed to use any package or tool except regex to get that problem solved. – Chris Dec 18 '18 at 20:29
  • 1
    [TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ](https://stackoverflow.com/a/1732454/3343753), why don't you use an HTML parser? – Pedro Rodrigues Dec 22 '18 at 00:22
  • @Pedro Rodrigues thank you for that great link! My example was a school exercise and only regex was allowed. In projects I still use HTML parser and lib tools to get that solved :) – Chris Dec 27 '18 at 13:44
  • @Chris, you're welcome. For future reference, if it is homework, please note that on your questions. Most times people will be able to notice it, giving it away from the gecko will give the best result on stackoverflow. – Pedro Rodrigues Dec 27 '18 at 20:16
  • 2
    what a bad exercise, tell that to your teacher. – Pedro Rodrigues Dec 27 '18 at 20:18

1 Answers1

1

First of all, I think you can use [^"] instead of [\w,;.:\-#'+~*?=&%\$!\/'\]\[@\(\)\s]

Unfortunately with regex it is not possible to capture all of the arguments in your case. Fur further explanation see this post: How to capture multiple repeated groups?

JL Meier
  • 26
  • 1
  • 4