0

How would you remove all html attributes with regex except the type="A", type="1", and type="I" attributes?

The following html:

<ol type="A" lang="en-CA" style="margin-bottom: 0in; line-height: 100%">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="I" style="font-weight: bold;">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>

Should become:

<ol type="A">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="I">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
a_b
  • 1,828
  • 5
  • 23
  • 37
  • 1
    A dedicated parser will *always* be better than any regex you can come up with. Because a regex will choke on [perfectly valid HTML markup like this](http://stackoverflow.com/questions/701166/), whereas a DOM parser won't. – Amal Murali Aug 05 '14 at 14:34
  • 1
    You [would not. At all.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – GolezTrol Aug 05 '14 at 14:35
  • (type="[\w]").*> - maybe this could work. Btw this is a good site for testing: https://www.debuggex.com/ – user2718671 Aug 05 '14 at 14:56
  • Is this [http://regex101.com/r/vL9vL6/3](http://regex101.com/r/vL9vL6/3) close to what you're looking for ? – hex494D49 Aug 05 '14 at 15:15
  • yes @hex494D40, looks like it - thank you. I will look into using a htmlparser also based on comments from Amal and Golez, thanks – a_b Aug 05 '14 at 15:35
  • Shall I paste it down as an (possible) answer if you found it useful :) ? – hex494D49 Aug 05 '14 at 15:44
  • sure, since it actually does answer the question... – a_b Aug 05 '14 at 16:11

1 Answers1

0

If you prefer doing it using regular expression try the following one

(<[a-z]+\stype="[A|I|1]")(?:[^>]+)? 

Replacement:

$1

Input:

<ol type="A" lang="en-CA" style="margin-bottom: 0in; line-height: 100%">
  <li type="A" s="something"><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="I" style="font-weight: bold;">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="1" style="font-weight: bold;">
  <li type="I" another="attribute"><span>Text</span></li>
  <li><span>More text</span></li>
</ol>

Output:

<ol type="A">
  <li type="A"><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="I">
  <li><span>Text</span></li>
  <li><span>More text</span></li>
</ol>
<ol type="1">
  <li type="I"><span>Text</span></li>
  <li><span>More text</span></li>
</ol>

Simple usage using JavaScript

var s = '<ol type="A" lang="en-CA" style="margin-bottom: 0in; line-height: 100%">';
var p = /(<[a-z]+\stype="[A|I|1]")(?:[^>]+)?/g
console.log(s.replace(p, '$1'));

Output:

<ol type="A">

Demo

hex494D49
  • 9,109
  • 3
  • 38
  • 47