3

What C# regular expression would replace all of these:

<BR style=color:#93c47d>
<BR style=color:#fefefe>
<BR style="color:#93c47d">
<BR style="color:#93c47d ...">
<BR>
<BR/>
<br style=color:#93c47d>
<br style=color:#fefefe>
<br style="color:#93c47d">
<br style="color:#93c47d ...">
<br>
<br/>

with:

<br/>

basically "remove all attributes from any BR element and lowercase it".

Edward Tanguay
  • 189,012
  • 314
  • 712
  • 1,047
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – thecoop Mar 22 '10 at 15:25
  • @thecoop: That is only relevant for actually parsing HTML, which this question does not require. In this case, the only thing that could break the regex is if there were a ">" inside an attribute, which I believe is invalid anyway. – Michael Myers Mar 22 '10 at 15:27
  • Who is the man who thought of that HTML? Can't imagine a use case. – Dykam Mar 22 '10 at 15:29
  • @Dykam believe it or not, this is HTML that is generated from published google docs, along with FONT tags – Edward Tanguay Mar 23 '10 at 03:09
  • yes, I think google docs must have a specification that their HTML output be compatible with Mosaic 1.0, reminds me of the HTML back in 1993 i.e. FONT tag, no attribute quotes – Edward Tanguay Mar 23 '10 at 07:44

2 Answers2

8

Something like:

Regex.Replace(myString, "<br[^>]*>", "<br/>", RegexOptions.IgnoreCase);

Or without the IgnoreCase:

Regex.Replace(myString, "<[Bb][Rr][^>]*>", "<br/>");
Michael Myers
  • 188,989
  • 46
  • 291
  • 292
0

Assuming you never had any attributes after style, I would bet something like

class Program
{
  const string SOURCE = @"<BR style=color:#93c47d>
<BR style=color:#fefefe>
<BR style=""color:#93c47d"">
<BR style='color:#93c47d'>
<BR>
<BR/>
<br style=color:#93c47d>
<br style=color:#fefefe>
<br style=""color:#93c47d"">
<br style='color:#93c47d'>
<br>
<br/>";

  static void Main(string[] args)
  {
    const string EXPRESSION = @"(style=[^""'][^>]*)|(style=""[^""]*"")|(style='[^']*')";

    var regex = new Regex(EXPRESSION);

    Console.WriteLine(regex.Replace(SOURCE, string.Empty));
  }
}

You might be better off with a programmatic solution if there are attributes written into a tag after the style attribute.