0

I have C# list where lot of values like this

<b>Moon</b>

and i want to remove <b> and </b>.

I want result like this Moon.

How can i remove this type of characters from list.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
Pankaj Mishra
  • 20,197
  • 16
  • 66
  • 103
  • Your post appears to have been mangled by the formatting code, hard to tell what you started with... – fyjham Nov 27 '09 at 14:04

5 Answers5

5

You can use XDocument to remove the XML tags:

string StripXmlTags(string xml)
{
    XDocument doc = XDocument.Parse(xml);
    return doc.Root.Value;
}

Example:

[Test]
public void Test()
{
    string xml = "<root><b>nice </b><c>node</c><d><e> is here</e></d></root>";
    string result = StripXmlTags(xml);

    Assert.AreEqual("nice node is here", result);
}
Elisha
  • 23,310
  • 6
  • 60
  • 75
1

Try this:

var moonHtml = "<b>Moon</b>";
var regex = new Regex("</?(.*)>", RegexOptions.IgnoreCase | RegexOptions.Multiline);
var moon = regex.Replace(moonHtml, string.Empty);
Sani Huttunen
  • 23,620
  • 6
  • 72
  • 79
  • 12 secs faster, what a shame ;) – Elephantik Nov 27 '09 at 14:05
  • 1
    Why specify "zero or one /" `/?` when the slash would've been included in the dot that follows? Why specify ignore-case when there are no alphabetic characters? Best practice? Oh well. Your code is greedy. If there's a string like `abc Moon more text more moon` then you'll just end up with "abc ". – David Hedlund Nov 27 '09 at 14:08
  • 1
    DO NOT use regex to parse html - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – thecoop Nov 27 '09 at 14:29
0

Try this:

Regex.Replace("<b>Moon</b>", @"\<.+?\>", "")
Elephantik
  • 1,998
  • 1
  • 17
  • 23
0
string noHtml = Regex.Replace(inputWithHtmlTags, "<[^>]+>", "");
David Hedlund
  • 128,221
  • 31
  • 203
  • 222
0

This program is a very crude illustration of a regex that will remove all tags, it's flexible enough to also remove italic and underlines. It use the IgnoreCase option to guard against <b>or <B> being in the input and will carry out the search over multiple lines. The output from running this will be "The Man on the Moon". I use .*? meaning zero or more to catch cases of empty brackets such as <>

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication3
{
    class Program
    {
       static void Main(string[] args)
       {
           var input = "<b>The</b> <i>Man</i> on the <U><B>Moon</B></U>";

           var regex = new Regex("<.*?>", RegexOptions.IgnoreCase | RegexOptions.Multiline);

           var output = regex.Replace(input, string.Empty);

           Console.WriteLine(output);
           Console.ReadLine();
      }
    }

}