1

i am having a variable in c# holding some string like this

string myText="my text  which contains <div>i am text inside div</div>";

now i want to replace all "\n" (new line character) with "<br>" for this variable's data except for text inside div.

How do i do this??

Praveen Prasad
  • 31,561
  • 18
  • 73
  • 106

5 Answers5

2

For something like this you will need to parse the HTML in order to distinguish the parts that you do want to make the replacement in from the ones you don't.

I suggest looking at the HTML agility pack - it can parse HTML fragments as well as malformed HTML. You can then query the resulting parse tree using XPath notation and do your replacement on the selected nodes.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
2

Others have suggested using libraries such as HTMLAgilityPack. The former is indeed a nice tool, but if you don't need HTML parsing functionality beyond what you have requested, a simple parser should suffice:

    string ReplaceNewLinesWithBrIfNotInsideDiv(string input) {

        int divNestingLevel = 0;
        StringBuilder output = new StringBuilder();
        StringComparison comp = StringComparison.InvariantCultureIgnoreCase;

        for (int i = 0; i < input.Length; i++) {
            if (input[i] == '<') {
                if (i < (input.Length - 3) && input.Substring(i, 4).Equals("<div", comp)){
                    divNestingLevel++;
                } else if (divNestingLevel != 0 && i < (input.Length - 5) && input.Substring(i, 6).Equals("</div>", comp)) {
                    divNestingLevel--;
                }
            }

            if (input[i] == '\n' && divNestingLevel == 0) {
                output.Append("<br/>");
            } else {
                output.Append(input[i]);
            }
        }

        return output.ToString();
    }

This should handle nested divs as well.

jevakallio
  • 35,324
  • 3
  • 105
  • 112
  • Nice... I was in the process of writing such an example you just got done first... This text worked just fine with it by the way... string test = "My text !\n which contains
    i am text !\n" + "inside
    nested div!\n
    div
    " + " outside of div !\nDid this work";
    – John Sobolewski Feb 20 '11 at 19:05
1

That would require some fairly complicated RegEx, out of my league.

But you could try splitting the string:

string[] parts = myText.Split("<div>", "</div>");

for (int i = 0; i < parts.Length; i += 2)  // only the even parts
  parts[i] = string.Replace(...);

And then use a StringBuilder to re-assemble the parts.

H H
  • 263,252
  • 30
  • 330
  • 514
  • I remember an answer on HTML and RegEx: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – GvS Feb 20 '11 at 19:08
  • @gvs: And that is why Oded has the right answer, in principle. But Praveen might find it a bit overwhelming for 'just' replacing a few newlines. – H H Feb 20 '11 at 19:14
  • myText.Split("
    ", "
    "); // accepts characters array only to split, unable to split string with it!!
    – Praveen Prasad Feb 20 '11 at 22:04
  • @Praveen, I think the `param strings` overload os Split() may be new in Fx4. – H H Feb 20 '11 at 22:17
0

I would split the string on div then look at the tokens if it starts with "div" then don't replace \n with BR if it does start with div then you need to find the closing div and split on that.. then take the 2nd token and do what you just did... of course as you are going to have to keep appending the tokens to a master string... I'll code up a sample here in a few minutes...

John Sobolewski
  • 4,512
  • 1
  • 20
  • 26
  • You need to escape Html tags here. – H H Feb 20 '11 at 18:46
  • I have the same problem with my basic idea... you have to deal with nested divs and make sure you know what nesting level you are at to determine when you are back outside of the div... – John Sobolewski Feb 20 '11 at 18:49
-1

Use the string.Replace() method like this:

 myText = myText.Replace("\n", "<br>")

You could consider using the Environment.NewLine property to find the newline chars. Are you sure they are not \n\r or \r\n etc...

You may have to pull the text inside the div out first if you dont want to parse that. Use a regex to find it and remove it then do the Replace() as above, then put the strings backtogether.

chrisp_68
  • 1,731
  • 23
  • 41