0

I found this post on selecting a range from an array, and have to use the LINQ option:

Selecting a range of items inside an array in C#

Ultimately, I'm trying to get the last four lines from some text file. After, I've read in and cleaned the lines for unwanted characters and empty lines, I have an array with all of the lines. I'm using the following to do so:

string[] allLines = GetEachLine(results);
string[] lastFourLines = allLines.Skip(allLines.Length - 4).Take(4).ToArray();

This works fine, but I'm wondering if I could somehow skip assinging to the allLines variable all together. Such as:

string[] lastFourLines = GetEachLine(results).Skip(returnedArrayLength - 4).Take(4).ToArray();
hallibut
  • 136
  • 1
  • 12
  • what is the type and nature of results parameter? – sairfan Dec 15 '21 at 19:22
  • 1
    What happened when you tried the code you want to use? – Jon Skeet Dec 15 '21 at 19:24
  • It's a string. It comes from calling File.ReadAllText(myFile). The string is everything that's in the file. GetEachLine is a method I wrote that ultimately gets rid of unwanted characters and lines, then returns a string[] with each element being a line from the file. – hallibut Dec 15 '21 at 19:25
  • So the top code works. In the bottom code, I don't know how to get the length of GetEachLine() without assinging it to a variable first. – hallibut Dec 15 '21 at 19:26
  • As noted in some other answers, you have already thrown out efficiency when you chose to return `string[]` from `GetEachLIne`. (BTW, what is the type of `results`?). At that point your first code works fine, except it runs through all the elements in the array to get to the last 4, much better to use your knowledge of it as an array to get the last four elements. – NetMage Dec 16 '21 at 20:24

3 Answers3

2

It would be better to change GetEachLine and code preceding it (however results is computed) to use IEnumerable<T> and avoid using an array to read the entire file in memory for the last four lines (unless you use all of results for something else) - consider using File.ReadLines.

However, if you are using .Net Core 2.0 or greater, you can use Enumerable.TakeLast to efficiently return the last four lines:

var lastFourLines = GetEachLine(results).TakeLast(4);
NetMage
  • 26,163
  • 3
  • 34
  • 55
  • Can you elaborate on what you meaning by using IEnumerable to avoid using an array? – hallibut Dec 19 '21 at 04:26
  • @hallibut It would be better if `GetEachLine` returned `IEnumerable` instead of `string[]`. I would need to see the code of `GetEachLine` to be specific on changes. – NetMage Dec 23 '21 at 00:43
1

if GetEachLine() returns string[] then that should work fine, though null checking may be needed.

As you chain more you may want to use line breaks to increase readability:

string[] lastFourLines = GetEachLine(results)
    .Skip(allLines.Length - 4)
    .Take(4)
    .ToArray();

allLines.Length won't exist unless you still have line 1 from your question, you can avoid calling GetEachLine() twice by using TakeLast().

string[] lastFourLines = GetEachLine(results)
    .TakeLast(4)
    .ToArray();
DCAggie
  • 144
  • 8
  • I guess what I'm trying to ask is, is there a way to not have to assign to allLines first? I can call .Skip on the return of GetEachLine because it returns a string[], but I'd like to pull the length of the returned array without assigning it to a variable first. – hallibut Dec 15 '21 at 19:28
  • I've updated the answer to address the calling GetEachLIne() twice issue – DCAggie Dec 15 '21 at 19:32
  • Ah interesting; I'm curious, is reversing a string a peformance hit or does it affect peformance in a trivial way? I know I could always benchmark and see, but since I asked this question as a matter of elegance rather than necessity - I'd like to understand the balance. – hallibut Dec 15 '21 at 19:48
  • 2
    @hallibut It involves copying the entire contents of the sequence into a new data structure, and meaning you'll now have copied the entire contents of the file into two separate data structures, holding both in memory at the same time, when you only need 4 lines. Unless the file is *always* small, that's likely a significant time and memory cost over a more optimized solution. – Servy Dec 15 '21 at 19:57
  • @Servy There isn't a guarantee that the file is always small. So it's likely better to use my first method than call reverse twice to save a line and a variable? – hallibut Dec 15 '21 at 20:22
  • 1
    @hallibut You copy the entire contents of the file to a data structure once, which is half as much as this solution, but still once more than you need to to solve this problem. You never actually need more than 4 lines in memory at any one time to solve this problem. – Servy Dec 15 '21 at 20:29
  • Maybe I've misunderstood your question. If the question is as I assumed "can I chain linq directly from the call to my method" the answer is yes and here is an example. If the question is *should* I do it this way then the answer would be different. Assuming you do have c# 9 as an option and you need to process the entire file to do cleanup and checks then I'd just stick with the range option ie `GetEachLine(results)[^4..^0];` – DCAggie Dec 15 '21 at 21:33
  • @hallibut Note that the `Reverse` _is not_ reversing a `String`: it is reversing the order of processing of the lines. – NetMage Dec 16 '21 at 20:22
  • @DCAggie The problem with your solution is you get an exception when `GetEachLine` returns an array with fewer than 4 elements. – NetMage Dec 16 '21 at 20:24
  • @NetMage - Yes, while `Take(4)` from my answer will work even with a returned array with less than 4 elements the `^4..^0` range from my pervious comment will throw an out of range exception if there are less that 4 elements. More importantly I'm an idiot and `TakeLast()` appears to be a thing. I've updated my answer to use this instead of `Reverse()` and upvoted your answer – DCAggie Dec 16 '21 at 23:01
0

If you are looking to efficiently retrieve the last N (filtered) line of a large file, you really need to start at the point where you are reading the file contents.

Consider a 1GB log file containing 10M records, where you only want the last few lines. Ideally, you would want to start by reading the last couple KB and then start extracting lines by searching for line breaks from the end, extracting each line and returning them in an iterator yield. If you run out of data, read the preceding block. Continue only as long as the consumer requests more values from the iterator.

Offhand, I don't know a built-in way to do this, and coding this from scratch could get pretty involved. Luckily, a search turned up this similar question having a highly rated answer.

T N
  • 4,322
  • 1
  • 5
  • 18