0

i want to be able to take html code and render plain text out of it.

another words this would be my input

<h3>some text</h3>

i want the result to look like this:

some text

how would i do it?

Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062

4 Answers4

3

I would suggest trying the HTML Agility Pack for .NET:

Html Agility Pack - Codeplex

Attemtping to parse through HTML with anything else is, for the most part, unreliable.

Whatever you do, DON'T TRY TO PARSE HTML WITH REGEX!

Community
  • 1
  • 1
Justin Niessner
  • 242,243
  • 40
  • 408
  • 536
  • I think that HtmlAgilityPack is not needed for this simple task. See my answer. – sashaeve Apr 14 '10 at 12:51
  • @sashaeve And see my updated answer. For a simple example like this, RegEx might work...but this is just an example. My guess is his real problem is much more complex and that SO post explains IN DEPTH why you can't parse HTML with RegEx. – Justin Niessner Apr 14 '10 at 12:52
1

Use regex.

String result = Regex.Replace(your_text_goes_here, @"<[^>]*>", String.Empty);
sashaeve
  • 9,387
  • 10
  • 48
  • 61
  • @sashaeve: This is not reliable enough to render HTML – James Apr 14 '10 at 12:52
  • 1
    @James: Why not? All depends on what complexity of HTML will be used as source. If such simple as in example - this will be enough. – sashaeve Apr 14 '10 at 12:55
  • yes maybe so (as I have suggested myself) however I am assuming that the HTML would be a little more complex than what has been provided in the example. – James Apr 14 '10 at 13:02
  • Regex will only get you in trouble, just use a proper parser. Your argument "this will work on the example", doesn't sound right in my ears. I mean then string StripHtml(string input){return "some text";}, would be a valid answer as well. Much simpler and still no need for regex. Just use Html Agility Pack and save yourself the headaches. – JohannesH Dec 23 '11 at 18:06
0

You would need to use some form of HTML parser. You could use an existing Regex or build your own. However, they aren't always 100% reliable. I would suggest using a 3rd party utility like HtmlAgilityPack (I have used this one and would recommend it)

James
  • 80,725
  • 18
  • 167
  • 237
0

Poor Man's HTML Parser

        string s =
            @"
            <html>
            <body>
            <h1>My First Heading</h1>
            <p>My first paragraph.</p>
            </body>
            </html> 
        ";

        foreach (var item in s.Split(new char[]{'<'}))
        {
            int x = item.IndexOf('>');

            if (x != -1)
            {
                Console.WriteLine(item.Substring(x).Trim('>'));
            }
        }
Pratik Deoghare
  • 35,497
  • 30
  • 100
  • 146