52

I am removing text from a string and what to replace each line with a blank line.

Some background: I am writing a compare function that compares two strings. Its all working fine and are displayed in there two separate web browsers. When i try scroll down on my browsers the strings are different lengths, I want to replace the text i am removeing with a blank line so that my strings are the same length.

In the code below i am looking to count how many lines aDiff.Text has

Here is my code:

public string diff_prettyHtmlShowInserts(List<Diff> diffs)
    {
        StringBuilder html = new StringBuilder();

        foreach (Diff aDiff in diffs)
        {
            string text = aDiff.text.Replace("&", "&amp;").Replace("<", "&lt;")
              .Replace(">", "&gt;").Replace("\n", "<br>"); //&para;
            switch (aDiff.operation)
            {

                case Operation.DELETE:                              
                   //foreach('\n' in aDiff.text)
                   // {
                   //     html.Append("\n"); // Would like to replace each line with a blankline
                   // }
                    break;
                case Operation.EQUAL:
                    html.Append("<span>").Append(text).Append("</span>");
                    break;
                case Operation.INSERT:
                    html.Append("<ins style=\"background:#e6ffe6;\">").Append(text)
                        .Append("</ins>");
                    break;
            }
        }
        return html.ToString();
    }
Gaz Winter
  • 2,924
  • 2
  • 25
  • 47
Pomster
  • 14,567
  • 55
  • 128
  • 204
  • This works but i need to have a new line for each of the old lines that just makes one new line for a whole string that could be 8 lines – Pomster Jun 25 '12 at 12:33

14 Answers14

95

Method 1:

int numLines = aDiff.text.Length - aDiff.text.Replace _
                   (Environment.NewLine, string.Empty).Length;

Method 2:

int numLines = aDiff.text.Split('\n').Length;

Both will give you number of lines in text.

Rohan Bari
  • 7,482
  • 3
  • 14
  • 34
poncha
  • 7,726
  • 2
  • 34
  • 38
  • Thanks let me check it out :D – Pomster Jun 25 '12 at 12:34
  • The best overloaded method match for 'string.Split(params char[])' has some invalid arguments is the error i recived – Pomster Jun 25 '12 at 12:38
  • Sorry can't upvote yet, need the rep to be higher, but thanks for the help :) – Pomster Jun 26 '12 at 05:46
  • 10
    Note that as far as performance is concerned, splitting a string will be allocating space to create the array just so that it can count the final number of elements in the array. This is very inefficient and if you run that over a big enough input text it will actually generate OutOfMemoryExceptions. @GrahamBedford [Answer](http://stackoverflow.com/a/34903526/90852) below is the most correct one here. – Casey Apr 12 '16 at 14:10
  • @Casey this answer includes two options, one of which is the same as Graham's solution. But it still allocates memory (text.Replace will alloc) – poncha Apr 13 '16 at 06:03
  • Option 1 is not he same as Grahams as far as I can see? Graham divides with Environment.NewLine.Length and adds the first line. Using option 1 on Environmnet.StackTrace outputs 24 in a case with 13 lines, as Grahams answer will output, so I agree with Casey. – noontz Oct 16 '17 at 08:56
  • 3
    There is a catch. If Environment.NewLine is \r\n then there is 2 characters length and it will result in double new lines.. So this will fix this problem int numLines = (aDiff.text.Length - aDiff.text.Replace _ (Environment.NewLine, string.Empty).Length) / Environment.NewLine.Length; – dkokkinos Feb 07 '21 at 20:48
22

You can also use Linq to count occurrences of lines, like this:

int numLines = aDiff.Count(c => c.Equals('\n')) + 1;

Late, but offers alternative to other answers.

CrnaStena
  • 3,017
  • 5
  • 30
  • 48
18

A variant that does not alocate new Strings or array of Strings

private static int CountLines(string str)
{
    if (str == null)
        throw new ArgumentNullException("str");
    if (str == string.Empty)
        return 0;
    int index = -1;
    int count = 0;
    while (-1 != (index = str.IndexOf(Environment.NewLine, index + 1)))
        count++;

   return count + 1;
}
Mafii
  • 7,227
  • 1
  • 35
  • 55
lokimidgard
  • 1,039
  • 10
  • 26
  • If a string ends with a newline, this method reports one too many lines. For ``"1\r\n2\r\n3\r\n"`` it reports 4 lines, for ``"1\r\n2\r\n3"`` it reports 3. My expectation is that there is not considered to be an additional line at the end when a string ends with a newline. – Christopher Hamkins Dec 30 '22 at 10:18
8

Inefficient, but still:

var newLineCount = aDiff.Text.Split('\n').Length -1;
nunespascal
  • 17,584
  • 2
  • 43
  • 46
  • it doesn't even compile! var newLineCount = aDiff.Text.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries).Length; – ilmatte Mar 19 '13 at 11:06
  • 1
    Just use the newline character `\n` – nunespascal Mar 19 '13 at 11:09
  • Sorry, you're right it compiles. Yet Environment.NewLine translates to the right newline character for the platform in which the application is running: http://msdn.microsoft.com/it-it/library/system.environment.newline.aspx – ilmatte Mar 21 '13 at 14:18
6

I did a bunch of performance testing of different methods (Split, Replace, for loop over chars, Linq.Count) and the winner was the Replace method (Split method was slightly faster when strings were less than 2KB, but not much).

But there's 2 bugs in the accepted answer. One bug is when the last line doesn't end with a newline it won't count the last line. The other bug is if you're reading a file with UNIX line endings on Windows it won't count any lines since Environment.Newline is \r\n and won't exist (you can always just use \n since it's the last char of a line ending for UNIX and Windows).

So here's a simple extension method...

public static int CountLines(this string text)
{
    int count = 0;
    if (!string.IsNullOrEmpty(text))
    {
        count = text.Length - text.Replace("\n", string.Empty).Length;

        // if the last char of the string is not a newline, make sure to count that line too
        if (text[text.Length - 1] != '\n')
        {
            ++count;
        }
    }

    return count;
}
JeremyWeir
  • 24,118
  • 10
  • 92
  • 107
  • well I thinks this should be accepted answer if performance results are correct. but I still don't understand how single loop through string could be slower and I'm almost sure that it would be faster with unsafe code – rattrapper Apr 06 '20 at 10:00
5
int newLineLen = Environment.NewLine.Length;
int numLines = aDiff.text.Length - aDiff.text.Replace(Environment.NewLine, string.Empty).Length;
if (newLineLen != 0)
{
    numLines /= newLineLen;
    numLines++;
}

Slightly more robust, accounting for the first line that will not have a line break in it.

seebiscuit
  • 4,905
  • 5
  • 31
  • 47
  • Why (when) would `Environment.NewLine.Length` return zero? Quote from https://msdn.microsoft.com/en-us/library/system.environment.newline(v=vs.110).aspx : *A string containing "\r\n" for non-Unix platforms, or a string containing "\n" for Unix platforms.* – poncha Apr 13 '16 at 06:02
  • I have no idea why it would be a length of zero. But when I divide by something that i'm not absolutely 100% will not be zero then I check anyway. But yes you are right as it stands on the current supported platforms it shouldn't be zero. – Graham Bedford Apr 27 '16 at 14:46
4
using System.Text.RegularExpressions;

Regex.Matches(text, "\n").Count

I think counting the occurrence of '\n' is the most efficient way, considering speed and memory usage.

Using split('\n') is a bad idea because it makes new arrays of string so it's poor in performance and efficiency! specially when your string gets larger and contains more lines.

Replacing '\n' character with empty character and calculating the difference is not efficient too, because it should do several operations like searching, creating new strings and memory allocations etc.

You can just do one operation, i.e. search. So you can just count the occurrence of '\n' character in the string, as @lokimidgard suggested.

It worth mentioning that searching for '\n' character is better than searching for "\r\n" (or Environment.NewLine in Windows), because the former (i.e. '\n') works for both Unix and Windows line endings.

Majid
  • 3,128
  • 1
  • 26
  • 31
4

Efficient and cost least memory.

Regex.Matches( "Your String" , System.Environment.NewLine).Count ;

Off course, we can extend our string class

using System.Text.RegularExpressions ;

public static class StringExtensions
{
    /// <summary>
    /// Get the nummer of lines in the string.
    /// </summary>
    /// <returns>Nummer of lines</returns>
    public static int LineCount(this string str)
    {
        return Regex.Matches( str , System.Environment.NewLine).Count ;
    }
}

reference : µBio, Dieter Meemken

Radian Jheng
  • 667
  • 9
  • 20
4

Late to the party here, but I think this handles all lines, even the last line (at least on windows):

Regex.Matches(text, "$", RegexOptions.Multiline).Count; 
tmr6183
  • 41
  • 1
3

to make things easy, i put the solution from poncha in a nice extention method, so you can use it simply like this:

int numLines = aDiff.text.LineCount();

The code:

/// <summary>
/// Extension class for strings.
/// </summary>
public static class StringExtensions
{
    /// <summary>
    /// Get the nummer of lines in the string.
    /// </summary>
    /// <returns>Nummer of lines</returns>
    public static int LineCount(this string str)
    {
        return str.Split('\n').Length;
    }
}

Have fun...

Community
  • 1
  • 1
Dieter Meemken
  • 1,937
  • 2
  • 17
  • 22
3
public static int CalcStringLines(string text)
{
    int count = 1;
    for (int i = 0; i < text.Length; i++)
    {
        if (text[i] == '\n') count++;
    }

    return count;
}

That's the fastest/easiest/no memory allocation way to do it...

2

I benchmarked all the answers.

Stack:

  • BenchmarkDotNet
  • .NET 6
  • Intel Core i7-9700K
  • HTML file with 50 lines
Method Mean Error StdDev Gen0 Gen1 Allocated
Test_Replace 978.1 ns 153.54 ns 8.42 ns 1.2722 - 7984 B
Test_IndexOfInCycle 336.0 ns 13.42 ns 0.74 ns - - -
Test_CycleOverString 2,815.7 ns 148.98 ns 8.17 ns - - -
Test_Split 1,253.2 ns 85.83 ns 4.70 ns 1.4648 0.0477 9192 B
Test_RegexMatchesCount 11,221.4 ns 1,196.62 ns 65.59 ns 1.3428 0.0305 8480 B
Test_CountCharUnsafe 3,054.4 ns 272.66 ns 14.95 ns - - -

The winner is IndexOfInCycle

private static int IndexOfInCycle(string str)
{
    int index = -1;
    int count = 0;
    while (-1 != (index = str.IndexOf('\n', index + 1)))
        count++;
    return count + 1;
}

UPDATE: there were errors in my benchmark, updated the results.

Also, I even tried iterating over string with unsafe it still loses to the IndexOf loop.

Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
1

You could use Regex. Try this code:

StringBuilder html = new StringBuilder();
//...
int lineCount = Regex.Matches(html.ToString(), Environment.NewLine).Count;
MB_18
  • 1,620
  • 23
  • 37
0

Here's my version, based on @NathanielDoldersum 's answer but modified to check for empty strings and more accurately count the last line. I consider a string ending with a newline to not have an additional line after that newline; the last line ends at the end of the string in that case.

It's only the third fastest method according to @AlexfromJitbit 's benchmark, but it doesn't allocate any memory.

        /// <summary>
        /// Counts the number of lines in a string. If there is a non-empty
        /// substring beyond the last newline character, it is also counted as a
        /// line, but if the string ends with a newline, it is not considered to have
        /// a final line after that newline.
        /// Empty and null strings are considered to have no lines.
        /// </summary>
        /// <param name="str">The string whose lines are to be counted.</param>
        /// <returns>The number of lines in the string.</returns>
        public static int countLines(string str)
        {
            if (string.IsNullOrEmpty(str))
            {
                return 0;
            }
            int count = 0;
            for (int i = 0; i < str.Length; i++)
            {
                if (str[i] == '\n') count++;
            }
            if (str.EndsWith("\n"))
            {
                return count;
            }
            return count + 1;
        }

Here's an XUnit unit test for it (which all pass of course):

        [Theory]
        [InlineData("1", 1)]
        [InlineData("1\n", 1)]
        [InlineData("1\r\n", 1)]
        [InlineData("1\n2\n3\n", 3)]
        [InlineData("1\n2\n3", 3)]
        [InlineData("1\r\n2\r\n3\r\n", 3)]
        [InlineData("1\r\n2\r\n3", 3)]
        [InlineData(null, 0)]
        [InlineData("", 0)]
        public void countLinesReturnsExpectedValue(string str, int expected)
        {
            Assert.Equal(expected, CUtils.countLines(str));
        }
Christopher Hamkins
  • 1,442
  • 9
  • 18