0

Can any one tell me the Regex pattern which checks for the empty span tags and replace them with   tag.

Something like the below :

string io = Regex.Replace(res,"" , RegexOptions.IgnoreCase);

I dont know what pattern to be passed in!

Malcolm
  • 1,801
  • 3
  • 21
  • 48
  • Please note that [regex should not be used to parse HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) –  Jan 31 '11 at 20:44

4 Answers4

2

The code of Jeff Mercado has error at lines:

.Where(e => e.Name.Equals("span", StringComparison.OrdinalIgnoreCase) && n.Name.Equals("span", StringComparison.OrdinalIgnoreCase)

Error message: Member 'object.Equals(object, object)' cannot be accessed with an instance reference; qualify it with a type name instead

They didn't work when I tried replace with other objects!

SliverNinja - MSFT
  • 31,051
  • 11
  • 110
  • 173
Tri
  • 21
  • 1
2

This pattern will find all empty span tags, such as <span/> and <span></span>:

<span\s*/>|<span>\s*</span>

So this code should replace all your empty span tags with br tags:

string io = Regex.Replace(res, @"<span\s*/>|<span>\s*</span>", "<br/>");
Andreas Vendel
  • 716
  • 6
  • 14
  • @Andreas tx for the answer. I also want to check for the span which has space in it as a content for eg : . and replace it with   say if content has two spaces then replace them with two &nbsp. – Malcolm Jan 31 '11 at 12:55
  • @Malcolm Try this: Regex.Replace(html, @"\s*", (match) => match.Value.Replace(" ", "&nbsp")) – Andreas Vendel Jan 31 '11 at 13:41
  • @Andreas didn't worked . doesn't get replaced :(. tx for your response – Malcolm Jan 31 '11 at 14:16
  • @Malcolm Did you capture the return value of Replace? It is supposed to be: html = Regex.Replace(html, @"\s*", (match) => match.Value.Replace(" ", "&nbsp")) – Andreas Vendel Jan 31 '11 at 15:17
  • @Andreas i did. It returns the string and i got the same string in back which i passed in. Do you any idea how to get the content between the two span tags? http://stackoverflow.com/questions/4851721/is-there-anyway-to-get-the-content-of-span-tag-using-regex-in-c – Malcolm Jan 31 '11 at 15:26
  • Note that this will catch XML tags which are inside strings or javascript code. –  Jan 31 '11 at 20:42
  • @Malcolm I tested the following code: "string html = Regex.Replace(@" testtest", @"\s*", (match) => match.Value.Replace(" ", "&nbsp"));". After it executed, html contained "&nbsp&nbsptesttest". Maybe it is a problem with case? Try adding RegexOptions.IgnoreCase to the replace call. Otherwise I don't know what the problem might be without looking at the code. You might consider starting a new question if you don't get it working. – Andreas Vendel Jan 31 '11 at 20:44
  • @Andreas it worked for me too. tx for all the help. Another thing i don't know it makes much sense or not whether we put &nbsp or nbsp; – Malcolm Feb 01 '11 at 11:00
0

My favourite answer to this problem is this one: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Jack Allan
  • 14,554
  • 11
  • 45
  • 57
0

You should parse it, searching for the empty span elements and replace them. Here's how you can do it using LINQ to XML. Just note that depending on the actual HTML, it may require tweaks to get it to work since it is an XML parser, not HTML.

// parse it
var doc = XElement.Parse(theHtml);

// find the target elements
var targets = doc.DescendantNodes()
                 .OfType<XElement>()
                 .Where(e => e.Name.Equals("span", StringComparison.OrdinalIgnoreCase)
                          && e.IsEmpty
                          && !e.HasAttributes)
                 .ToList(); // need a copy since the contents will change

// replace them all
foreach (var span in targets)
    span.ReplaceWith(new XElement("br"));

// get back the html string
theHtml = doc.ToString();

Otherwise, here's some code showing how you can use the HTML Agility Pack to do the same (written in a way that mirrors the other version).

// parse it
var doc = new HtmlDocument();
doc.LoadHtml(theHtml);

// find the target elements
var targets = doc.DocumentNode
                 .DescendantNodes()
                 .Where(n => n.NodeType == HtmlNodeType.Element
                          && n.Name.Equals("span", StringComparison.OrdinalIgnoreCase)
                          && !n.HasChildNodes && !n.HasAttributes)
                 .ToList(); // need a copy since the contents will change

// replace them all
foreach (var span in targets)
{
    var br = HtmlNode.CreateNode("<br />");
    span.ParentNode.ReplaceChild(br, span);
}

// get back the html string
using (StringWriter writer = new StringWriter())
{
    doc.Save(writer);
    theHtml = writer.ToString();
}
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • @yoda: Well that's where the problem lies. It would require tweaks then. Otherwise using an actual HTML parser (such as the one in the [HTML Agility Pack](http://htmlagilitypack.codeplex.com/)) will be better. Though the code will be slightly different however. – Jeff Mercado Jan 31 '11 at 20:49