11

I have a string of unknown length

it is in the format

\nline
\nline
\nline

with out know how long it is how can i just take the last 10 lines of the string a line being separated by "\n"

Simon MᶜKenzie
  • 8,344
  • 13
  • 50
  • 77
user1588670
  • 283
  • 1
  • 3
  • 13

6 Answers6

14

As the string gets larger, it becomes more important to avoid processing characters that don't matter. Any approach using string.Split is inefficient, as the whole string will have to be processed. An efficient solution will have to run through the string from the back. Here's a regular expression approach.

Note that it returns a List<string>, because the results need to be reversed before they're returned (hence the use of the Insert method)

private static List<string> TakeLastLines(string text, int count)
{
    List<string> lines = new List<string>();
    Match match = Regex.Match(text, "^.*$", RegexOptions.Multiline | RegexOptions.RightToLeft);

    while (match.Success && lines.Count < count)
    {
        lines.Insert(0, match.Value);
        match = match.NextMatch();
    }

    return lines;
}
Simon MᶜKenzie
  • 8,344
  • 13
  • 50
  • 77
  • I can't up vote but after trying all the solutions this is the way to go it is hella fast, thank you Simon you are an awesome programmer. – user1588670 Aug 14 '12 at 19:51
  • @SimonMcKenzie Nice solution. RegEx is a very powerful but often overlooked feature of C#. – MikeKulls Aug 15 '12 at 01:13
9
var result = text.Split('\n').Reverse().Take(10).ToArray();
Volma
  • 1,305
  • 9
  • 17
  • 3
    +1 although this will reverse the order of the lines which may be unattended. You could append another `Reverse` at the end. The `ToArray()` is redundant since OP hasn't mentioned that he wants an array. – Tim Schmelter Aug 13 '12 at 22:10
  • @codesparkle: Because `Skip` enumerates the whole (huge) array, just to take the last 10 elements. `Reverse.Take` will be implemented like a `For-loop` which only loops the last 10 elements in reverse order what is more efficient and also somewhat more readable. – Tim Schmelter Aug 13 '12 at 23:22
  • @TimSchmelter learning new things about LINQ every day ;) thanks for the explanation. – Adam Aug 13 '12 at 23:48
  • @user1588670 Consider you mentioned that the input string is "HUGE" this is simply not the correct answer. You need a solution that doesn't make a copy of the data. I believe this will actually make 2 copies of the data depending on how Reverse is implemented. – MikeKulls Aug 13 '12 at 23:51
  • @MikeKulls: 1. please don't downvote competitive answers. 2. `Enumerable.Reverse` is implemented using deferred execution(it does only enumerate the 10 lines if you execute the query) and does not create a new collection. You might want to look two comments above. If he removes the `ToArray` the only new object in memory is the `String-Array` from the split. – Tim Schmelter Aug 14 '12 at 00:23
  • @TimSchmelter I apologize if I have broken any rules. Is it really that bad to do this? I think creating an entire copy of the data is quite a problem so warrants a down vote. The stack overflow blog I read yesterday said if you don't like an answer then downvote it. As for reverse I did say "depending on how it is implemented". In some cases it must create a buffer but I presume you are saying it is optimised for arrays. – MikeKulls Aug 14 '12 at 00:48
  • @Mike "creating an entire copy of the data is quite a problem so warrants a down vote" – Volma Aug 14 '12 at 01:18
  • @Volma are you asking me why I wrote that? The split function will take the original string, which the OP said was "HUGE", and it will make a copy of that entire string, doubling the amount of memory required. If the machine is low on ram then it will need to use the hard drive for this which could take a while. A more efficient method in comparison would run almost instantly and use almost no additional ram. – MikeKulls Aug 14 '12 at 01:21
  • 1
    @Mike, you are absolutely right, it's obvious that my answer will not give optimal performance. However, I cannot agree with generality of your statement: creating a copy may or may not be a problem - depending on the length of the string and how often this operation is performed. Fewer lines of code to maintain sometimes is more important than premature optimization. The question is stated in the simplest form, without any context or any performance requirements to consider. Therefore the simplest solution that does the job is a valid one, and may even be the best one. – Volma Aug 14 '12 at 01:34
  • @Mike BTW, I hope you realize that solution that you provided is very far from giving optimal performance. Better algorithm would be based of scanning part of the string starting from the end, and it wouldn't involve any iterators. But we weren't looking for the fastest algorithm, were we? – Volma Aug 14 '12 at 01:44
  • @Volma I understand what you are saying, to write this as efficiently as possible would require a fair bit more code (IMO the optimal solution has not been posted, included my answer). However, given that the OP said the input string was "HUGE" I can't agree with making an entire copy. EDIT: Yes I realise my solution is not the optimal. This post was written before I saw your comment so I agree completely. – MikeKulls Aug 14 '12 at 01:44
  • Doesn't your algorithm make a copy of the string as it moves along the string? The only difference is after making a copy of segment 1 it will discard the copy as it moves past segment 10 (items.RemoveAt(0)). I don't believe the poster mentioned HUGE string before I gave my answer. – Volma Aug 14 '12 at 01:51
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15302/discussion-between-mikekulls-and-volma) – MikeKulls Aug 14 '12 at 01:52
  • @TimSchmelter You meant `Split` instead of `Skip` there, right? – julealgon Oct 02 '13 at 18:23
  • @julealgon: I assume you mean my second comment above according to codesparkles (deleted!) comment? I think he has mentioned another approach using `Enumerable.Skip` and he has deleted his comment later. – Tim Schmelter Oct 02 '13 at 19:41
  • @TimSchmelter Exactly. It boils down to this quote, 'Because `Skip` enumerates the whole (huge) array...', which I think is wrong if you are actually talking about `Enumerable.Skip`. – julealgon Oct 02 '13 at 19:54
  • @julealgon: Why? As opposed to `Enumerable.Reverse` `Enumerale.Skip` is not optimized in a way that is uses a `for-loop` if it's an `ICollection`. It will enumerate the sequence. – Tim Schmelter Oct 02 '13 at 20:43
  • @TimSchmelter Ah ok ok, I misunderstood you. For some reason, I thought you meant that `Skip` traversed the whole collection, instead of stopping after reaching the count. My bad ;) – julealgon Oct 02 '13 at 21:19
6

Split() the string on \n, and take the last 10 elements of the resulting array.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
3

If this is in a file and the file is particularly large, you may want to do this efficiently. A way to do it is to read the file backwards, and then only take the first 10 lines. You can see an example of using Jon Skeet's MiscUtil library to do this here.

var lines = new ReverseLineReader(filename);
var last = lines.Take(10);
Community
  • 1
  • 1
yamen
  • 15,390
  • 3
  • 42
  • 52
0

Here's one way to do it that has the advantage that it doesn't create copies of the entire source string so is fairly efficient. Most of the code would be placed in a class along with other general purpose extension methods so the end result is that you can do it with 1 line of code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string x = "a\r\nb\r\nc\r\nd\r\ne\r\nf\r\ng\r\nh\r\ni\r\nj\r\nk\r\nl\r\nm\r\nn\r\no\r\np";
            foreach(var line in x.SplitAsEnumerable("\r\n").TakeLast(10))
                Console.WriteLine(line);
            Console.ReadKey();
        }
    }

    static class LinqExtensions
    {
        public static IEnumerable<string> SplitAsEnumerable(this string source)
        {
            return SplitAsEnumerable(source, ",");
        }

        public static IEnumerable<string> SplitAsEnumerable(this string source, string seperator)
        {
            return SplitAsEnumerable(source, seperator, false);
        }

        public static IEnumerable<string> SplitAsEnumerable(this string source, string seperator, bool returnSeperator)
        {
            if (!string.IsNullOrEmpty(source))
            {
                int pos = 0;
                do
                {
                    int newPos = source.IndexOf(seperator, pos, StringComparison.InvariantCultureIgnoreCase);
                    if (newPos == -1)
                    {
                        yield return source.Substring(pos);
                        break;
                    }
                    yield return source.Substring(pos, newPos - pos);
                    if (returnSeperator) yield return source.Substring(newPos, seperator.Length);
                    pos = newPos + seperator.Length;
                } while (true);
            }
        }

        public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
        {
            List<T> items = new List<T>();
            foreach (var item in source)
            {
                items.Add(item);
                if (items.Count > count) items.RemoveAt(0);
            }
            return items;
        }
    }
}

EDIT: It has been pointed out that this could be more efficient because it iterates the entire string. I also think that RemoveAt(0) with a list is probably inefficient also. To resolve this the code could be modified to search through the string backwards. This would eliminate the need for the TakeLast function as we could just use Take.

MikeKulls
  • 2,979
  • 2
  • 25
  • 30
0

space efficient approach

    private static void PrintLastNLines(string str, int n)
    {
        int idx = str.Length - 1;
        int newLineCount = 0;

        while (newLineCount < n)
        {
            if (str[idx] == 'n' && str[idx - 1] == '\\')
            {
                newLineCount++;
                idx--;
            }

            idx--;
        }

        PrintFromIndex(str, idx + 3);
    }

    private static void PrintFromIndex(string str, int idx)
    {
        for (int i = idx; i < str.Length; i++)
        {
            if (i < str.Length - 1 && str[i] == '\\' && str[i + 1] == 'n')
            {
                Console.WriteLine();
                i++;
            }
            else
            {
                Console.Write(str[i]);
            }
        }

        Console.WriteLine();
    }
Omar Salem
  • 137
  • 1
  • 2
  • 11