-3

I am loading a file using File.ReadLines method (Files could get very large so I used this rather than ReadAllLines)

I need to access each line and perform an action on it. So my code is like this

IEnumerable<String> lines = File.ReadLines("c:\myfile.txt", new UTF8Encoding());

StringBuilder sb = new StringBuilder();

int totalLines = lines.Count();  //used for progress calculation

//use for instead of foreach here - easier to know the line I'm on for progress percent complete calculation
for(int i = 0; i < totalLines; i++){

    //for example get the line and do something
    sb.Append(lines.ElementAt(i) + "\r\n");

    //get the line again using ElementAt(i) and do something else
    //...ElementAt(I)...
}

So my bottleneck is each time I access ElementAt(i)because it has to iterate over the entire IEmumerable to get to position i.

Is there any way to keep using File.ReadLines, but improve this somehow?

EDIT - the reason I count at the beginning is so I can calculate progress complete for display to the user. Which is why I removed foreach in favor of the for.

erotavlas
  • 4,274
  • 4
  • 45
  • 104
  • Can you not just enumerate it...? `foreach(var line in lines) {...} ` – pinkfloydx33 Jan 10 '17 at 21:28
  • Did you look at the documentation for `File.ReadLines`? – Servy Jan 10 '17 at 21:29
  • 1
    Do you really need the count to start with? If you don't, you should really avoid calling `Count()` and *then* iterating. – Jon Skeet Jan 10 '17 at 21:29
  • @JonSkeet see my edit, it was for calculating progress. I had foreach originally but it wasn't clear how far into the file I was at the time. – erotavlas Jan 10 '17 at 21:38
  • @erotavlas If reading through all of the lines in the file once to get the count is worth it to be able to know how far through the file you are when processing, then you can choose to compute the count first. The decision as to whether or not it's worth it is up to you. – Servy Jan 10 '17 at 21:40
  • @Servy I have no choice, I have to count the lines if I want to calculate progress - but that is not the bottleneck since it only occurs once - my issue is the attempt to access individual lines using ElementAt() – erotavlas Jan 10 '17 at 21:43
  • You can use a foreach and increment `i` manually. – Kalten Jan 10 '17 at 21:43
  • @erotavlas Computing the `count` in no way forces to to use a `for` loop. You just compute the count, and keep using the old loop that you had that actually worked correctly, but now with a count that you can use. – Servy Jan 10 '17 at 21:43
  • @Kalten but I still don't know the total until I reach the end so how does that help? – erotavlas Jan 10 '17 at 21:47
  • @Servy ok I think I see what you guys are saying, I'll give it a try – erotavlas Jan 10 '17 at 21:49
  • You says that the bottleneck is `ElementAt`. So keep your `Count()`. – Kalten Jan 10 '17 at 21:50
  • @erotavlas If you call `Count` and compute the total, then you do in fact know the total when you then go and iterate through the file's lines again... – Servy Jan 10 '17 at 21:50
  • @Servy I realized another reason I did what I did - on some lines I need to backtrack - i.e. decrement the counter so I can go back a line - is this possible with foreach? – erotavlas Jan 10 '17 at 21:52
  • 1
    @erotavlas You'll need to explicitly keep track of the previous line, if you need to know what was on the previous line. – Servy Jan 10 '17 at 21:53
  • @Servy ok, kind of a pain, but I guess it will be faster than my current code – erotavlas Jan 10 '17 at 21:54
  • @erotavlas I bet it would be much much faster. – Ivan Stoev Jan 10 '17 at 21:55
  • @Dan They're not my requirements. – Servy Jan 10 '17 at 22:45
  • @erotavlas I've incorporated your additional requirements into my answer. I hope you find it useful. – Dan Jan 10 '17 at 22:55

2 Answers2

2

How about using foreach? It's designed to handle exactly this situation.

IEnumerable<String> lines = File.ReadLines("c:\myfile.txt", new UTF8Encoding());

StringBuilder sb = new StringBuilder();

string previousLine = null;
int lineCounter = 0;
int totalLines = lines.Count();

foreach (string line in lines) {

    // show progress
    float done = ++lineCounter/totalLines;
    Debug.WriteLine($"{done*100:0.00}% complete");

    //get the line and do something
    sb.AppendLine(line);

    //do something else, like look at the previous line to compare
    if (line == previousLine) {
        Debug.WriteLine($"Line {lineCounter} is the same as the previous line.");
    }

    previousLine = line;
}
Dan
  • 901
  • 11
  • 25
  • The only problem I think of is that it can get confusing down the road if you have to refer to previous lines multiple times. Say you need to continue analyzing a line from where you left off, if you go back one, then that line becomes the new current line and you continue from there, however the foreach loop becomes out of sync with the new iteration, since its still referencing a line ahead of the current line. Do you see what I'm saying? – erotavlas Jan 10 '17 at 22:55
  • In my code, `line` is always current. `previousLine` is always the one before that. You're not moving the pointer within the file. You're just holding on to the previous line you read. Nothing gets out of sync. – Dan Jan 10 '17 at 22:58
0

Sure, you can use a foreach instead the for loop, so you don't have to go back and reference the line via its index:

foreach (string line in lines)
{
    sb.AppendLine(line);
}

You will also no longer need the int totalLines = lines.Count(); line because you don't need the count for anything (unless you're using somewhere you're not showing).

rory.ap
  • 34,009
  • 10
  • 83
  • 174
  • Although this will still read the file twice, of course... once for the initial count (assuming the `.Count()` call is still there). – Jon Skeet Jan 10 '17 at 21:28
  • @JonSkeet They won't need to compute the initial count if they don't have a `for` loop. – Servy Jan 10 '17 at 21:29
  • @JonSkeet -- I'm using the `lines` which is already read....or I'm not sure what you're saying. – rory.ap Jan 10 '17 at 21:29
  • @Servy: Well maybe - it's hard to tell as we don't know what the other code does. – Jon Skeet Jan 10 '17 at 21:29
  • Don't need the Count() if it's only use was for the upper bound on the for loop – pinkfloydx33 Jan 10 '17 at 21:30
  • @rory.ap: I think it would be worth pointing out that the OP should remove the `Count()` call then - it looks like you're only replacing the `for` loop with a `foreach` loop, which will certainly help, but if they don't need the count they really, really shouldn't fetch it. – Jon Skeet Jan 10 '17 at 21:30
  • If he used the count later, he could just increment a counter as he looped through each line. – Mason11987 Jan 10 '17 at 21:30
  • @JonSkeet -- I'll update. I was assuming removing the count would be obvious since it's used for the for loop. – rory.ap Jan 10 '17 at 21:31
  • Never assume a piece of code is not needed - the count is there for a reason which is why I posted it. (see edit) – erotavlas Jan 10 '17 at 21:46
  • 1
    @erotavlas -- I don't see where else you're using `totalLines` other than the `for` loop. Also, it's the opposite: *always* assume a piece of code is not needed until you know it is. – rory.ap Jan 10 '17 at 21:54