18

I have a script that parses large files line by line. When it encounters an error that it can't handle, it stops, notifying us of the last line parsed.

Is this really the best / only way to seek to a specific line in a file? (fseek() is not usable in my case.)

<?php

for ($i = 0; $i < 100000; $i++)
    fgets($fp); // just discard this

I don't have a problem using this, it is fast enough - it just feels a bit dirty. From what I know about the underlying code, I don't imagine there is a better way to do this.

jasonbar
  • 13,333
  • 4
  • 38
  • 46

5 Answers5

37

An easy way to seek to a specific line in a file is to use the SplFileObject class, which supports seeking to a line number (seek()) or byte offset (fseek()).

$file = new SplFileObject('myfile.txt');
$file->seek(9999);     // Seek to line no. 10,000
echo $file->current(); // Print contents of that line

In the background, seek() just does what your PHP code did (except, in C code).

salathe
  • 51,324
  • 12
  • 104
  • 132
  • Nice! Came across this a while ago and started using it. – jasonbar Oct 01 '11 at 13:37
  • In this case, seek will directly read line 10,000, without walking through lines 1 - 9,999 to reach the given line? – Googlebot Oct 08 '11 at 11:28
  • @Ali: no, how do you think it knows where the lines start? It reads through the file. There are other alternatives if you do want to directly seek to a line but they involve potentially complex systems to keep track of where lines start in the file. – salathe Oct 09 '11 at 10:18
  • could you please give me some hits? I searched a lot to find a practical way to read a line without reading the entire file (considering big files of GB size). – Googlebot Oct 09 '11 at 10:44
  • @Ali: If I recall correctly there is a question here on SO with details of one implementation, or I could share the details of my own (though comments don't offer enough space). Sorry I don't have a link for the question that I (think that I) saw. – salathe Oct 09 '11 at 18:57
  • Thanks for your kind attention salsathe, Please take a look at this question http://stackoverflow.com/questions/7709908/mapping-a-flat-text-file – Googlebot Oct 10 '11 at 07:56
  • There seems to be a bug in large files, after a certain number seek will just stay on the same line, creating infinite loops if used with while ->eol – Tofandel Aug 18 '22 at 17:22
5

If you only have the line number to go on, there is no other method of finding the line. Files are not line based (or even character based), so there is no way to simply jump to a specific line in a file.

There might be other ways of reading the lines in the file that might be slightly faster, like reading larger chunks of the file into a buffer and read lines from that, but you could only hope for it to be a few percent faster. Any method to find a specific line in a file still has to read all data up to that line.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • Yeah, I figured as much. Somehow I thought that a nice `fseekbyline()` that was just a wrapper for the C code would make me feel better. heh. – jasonbar Aug 27 '10 at 22:50
4

I know it is late for posting but it can help some ppl I did a function like fseekbyline one day ...

function GoToLine($handle,$line)
{
  fseek($handle,0);  // seek to 0
  $i = 0;
  $bufcarac = 0;                    

  for($i = 1;$i<$line;$i++)
  {
    $ligne = fgets($handle);
    $bufcarac += strlen($ligne);  // in the end bufcarac will contains all caracters until the line
  }  

  fseek($handle,$bufcarac);
}

there is no error system, if you wanna go to the line <1 or 203 but the file is empty ... you will get nothing good.

same if you wanna go out of eot

Olive
  • 41
  • 1
  • 1
    By the time PHP has gone through the for loop, the pointer will be where you have desired. Simply calling fgets($handle) is enough to put in the for loop, and you can avoid memory loading up in the $bufcarac and $ligne variables. – Gregory Aug 16 '16 at 15:35
1
rewind($handle);

for ($i=0; $i < $desired_line; $i++) {
    fgetcsv($handle, 1000, ",");
}

This is working for me while I need to rewind to a specific line multiple times in my script.

I am not sure if this eats up memory or speed, but it does the trick.

Julix
  • 598
  • 1
  • 9
  • 20
  • This is short and to the point. Although the fgetcsv is specific to CSV files rather than any text file. It's helpful for me at least. – Gregory Aug 16 '16 at 15:36
0

If I understand correctly, you want to seek to the specific line at some point after you have found an error. If that is the case, you probably store or print the line-number of the bad line somewhere, depending on what you mean by "notify".

Unless you really mean that you cannot use fseek()*, what you can do is to also store/print the position in the file where the bad line starts. Then you can fseek().

* How, in that case, would fseekbyline() be usable if it existed?

Lajnold
  • 3,089
  • 1
  • 18
  • 7