Remove HTML formatting in Razor MVC 3

Question

I am using MVC 3 and Razor View engine.

What I am trying to do

I am making a blog using MVC 3, I want to remove all HTML formatting tags like <p> <b> <i> etc..

For which I am using the following code. (it does work)

 @{
 post.PostContent = post.PostContent.Replace("<p>", " ");   
 post.PostContent = post.PostContent.Replace("</p>", " ");
 post.PostContent = post.PostContent.Replace("<b>", " ");
 post.PostContent = post.PostContent.Replace("</b>", " ");
 post.PostContent = post.PostContent.Replace("<i>", " ");
 post.PostContent = post.PostContent.Replace("</i>", " ");
 }

I feel that there definitely has to be a better way to do this. Can anyone please guide me on this.

See **http://stackoverflow.com/questions/787932/using-c-sharp-regular-expressions-to-remove-html-tags** for more information or use **Html Agility Pack** for removing. — Alex Yaroshevich, Jul 31 '12 at 07:13

score 23 · Accepted Answer · edited May 23 '17 at 12:22

23

Thanks Alex Yaroshevich,

Here is what I use now..

post.PostContent = Regex.Replace(post.PostContent, @"<[^>]*>", String.Empty);

edited May 23 '17 at 12:22

Community

1
1

answered Jul 31 '12 at 07:15

Yasser Shaikh

46,934
46
204
281

score 2 · Answer 2 · answered Aug 01 '12 at 05:53

The regular expression is slow. use this, it's faster:

public static string StripHtmlTagByCharArray(string htmlString)
{
    char[] array = new char[htmlString.Length];
    int arrayIndex = 0;
    bool inside = false;

    for (int i = 0; i < htmlString.Length; i++)
    {
        char let = htmlString[i];
        if (let == '<')
        {
            inside = true;
            continue;
        }
        if (let == '>')
        {
            inside = false;
            continue;
        }
        if (!inside)
        {
            array[arrayIndex] = let;
            arrayIndex++;
        }
    }
    return new string(array, 0, arrayIndex);
}

You can take a look at http://www.dotnetperls.com/remove-html-tags

score 0 · Answer 3 · answered Jun 10 '14 at 13:52

Just in case you want to use regex in .NET to strip the HTML tags, the following seems to work pretty well on the source code for this very page. It's better than some of the other answers on this page because it looks for actual HTML tags instead of blindly removing everything between < and >. Back in the BBS days, we typed <grin> a lot instead of :), so removing <grin> is not an option. :)

This solution only removes the tags. It does not remove the contents of those tags in situations where that might be important -- a script tag, for example. You'd see the script, but the script wouldn't execute because the script tag itself gets removed. Removing the contents of an HTML tag is VERY tricky, and practically requires that the HTML fragment be well formed...

Also note the RegexOption.Singleline option. That's very important for any block of HTML. as there's nothing wrong with opening an HTML tag on one line and closing it in another.

string strRegex = @"</{0,1}(!DOCTYPE|a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|big|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hr|html|i|iframe|img|input|ins|kbd|keygen|label|legend|li|link|main|map|mark|menu|menuitem|meta|meter|nav|noframes|noscript|object|ol|optgroup|option|output|p|param|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr){1}(\s*/{0,1}>|\s+.*?/{0,1}>)";
Regex myRegex = new Regex(strRegex, RegexOptions.Singleline);
string strTargetString = @"<p>Hello, World</p>";
string strReplace = @"";

return myRegex.Replace(strTargetString, strReplace);

I'm not saying this is the best answer. It's just an option and it worked great for me.

Remove HTML formatting in Razor MVC 3

3 Answers3

Linked