0

user input content by text editor, and finally submitted to the database. before store in database,i want remove empty line in content at begin and end (the middle can not be removed).

i want use JavaScript and C#

sample content is:

<div>
    <p><span><br></span></p>
    <span>a<br/>bc</span>
    <p>te<br>st</p>
    <p>\n<span>\n</span></p>
    <p><span><br/></span></p>
</div>

i need is:

<div>
    <span>a<br/>bc</span>
    <p>te<br>st</p>
</div>

who can help me?

jessehouwing
  • 106,458
  • 22
  • 256
  • 341
artwl
  • 3,502
  • 6
  • 38
  • 53
  • Are the `

    ` tags *always* outside `` tags? Can you even rely on the user input tags to be balanced? In your 'sample content', would the line with `\n` be on one (text) line, or would there actually be newlines there in the input string?

    – mathematical.coffee Mar 14 '12 at 02:21
  • @mathematical.coffee \n and
    (or
    ) create by text editor
    – artwl Mar 14 '12 at 02:29
  • Well, do you want to use JavaScript or C#? With C# easiest thing to do would be to use an HTML parser and walk the generated tree looking for adjacent empty nodes. Regex won't help you here. – Roman Mar 14 '12 at 02:57
  • We just need to add this link: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – jessehouwing Mar 14 '12 at 08:29
  • Consider using a DOM tree and go recursively through all the nodes, remove those that only contain whitespace (C# `string.IsNullOrWhiteSpace(node.InnerText)`) and you're done. Either the Javascript DOM in the browser or the HTML Agility Pack in C# would let you do this. – jessehouwing Mar 14 '12 at 08:31

2 Answers2

1

Well if I understand what you are trying to accomplish, this should solve your problem:

        string input = @"
        <div>
            <p><span><br></span></p>
            <span>a<br/>bc</span>
            <p>te<br>st</p>
            <p>\n<span>\n</span></p>
            <p><span><br/></span></p>
        </div>
        ";
        string pattern = @"(<p>)?(\\n|<br/?>)?<span>(<br/?>|\\n)</span>(</p>)?";
        System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex(pattern);
        string final = reg.Replace(input, String.Empty);
        Console.WriteLine(final);
    }

That above code will return:

<div>

                <span>a<br/>bc</span>
                <p>te<br>st</p>


</div>

You could then go about trimming ever line, as it looks like it needs it.

hetelek
  • 3,776
  • 5
  • 35
  • 56
0

It is not mentioned in the question whether you want to clean up your content on the client or server side.

If it should be done on the server please don't use regex for it. Why? See this excellent answer. Use HTML parser instead. E.g. with HtmlAgiltyPack:

var doc = new HtmlDocument();
doc.LoadHtml(html);
foreach(var node in doc.DocumentNode.SelectNodes("//div|//span|//p"))
    if (string.IsNullOrWhiteSpace(node.InnerText.Replace(@"\n", string.Empty)))
        node.Remove();

var result = doc.DocumentNode.OuterHtml;

But it could be done even simplier on the client (without regex too) by using jQuery:

var dom = $(html);
dom.find('p,span,div').each(function() {
    if ($(this).text().trim() == '')
        $(this).remove();
});

var result = dom.wrap('<div>').parent().html();
Community
  • 1
  • 1
Oleks
  • 31,955
  • 11
  • 77
  • 132