3

I am already using this example of how to read large data files in PHP line by line

Now, what it'd like to do, is obtain the total number of rows in the file so that I may display a percentage complete or at least what the total number of rows are so I can provide some idea of how much processing is left to be done.

Is there a way to get the total number of rows without reading in the entire file twice? (once to count the rows and once to do the processing)

Community
  • 1
  • 1
Justin
  • 4,203
  • 7
  • 41
  • 58

5 Answers5

12

Poor mans answer:

No, but you can estimate. Calc a simple average reading (use the first 250 lines) and go with that.

estNumOfLines = sizeOfFile / avgLineSize

You could store off the number of lines in the file when you are creating the file...

Alternatively, you could display the number of KB processed, and that would be perfectly accurate.

cgp
  • 41,026
  • 12
  • 101
  • 131
  • +1 for your bolded suggestion (the only reasonable approach, IMO) – rmeador Apr 15 '09 at 18:34
  • I vote for just displaying the amount of data processed. Would probably be the best solution. – Kibbee Apr 15 '09 at 18:42
  • The entire process is based on the number of products processed, and I agree that usually the amount of actual data processed is preferable, but in this case the number of products processed makes more sense. – Justin Apr 16 '09 at 01:06
5

You can determine the size of the file, then guage your progress through it by adding up the size of your reads:

$fname = 'foofile.txt';
$fsize = filesize($fname);
$count = 0;
$handle = fopen($fname, "r") or die("Couldn't get handle");
if ($handle) {
  while (!feof($handle)) {
    $buffer = fgets($handle, 4096);
    // Process buffer here..
    $count++;
    echo ($count * 4096)/$fsize . " percent read.";
  }
  fclose($handle);
}

Note: code adapted from referenced answer

Community
  • 1
  • 1
vezult
  • 5,185
  • 25
  • 41
  • Thanks for posting the code, even though I didn't pick your solution, this will be handy for future use! – Justin Apr 15 '09 at 20:28
3

Is there any reason you need to count rows and not bytes? If all you want to know is "percent done", just track it the by number bytes read/total bytes.

KenE
  • 1,805
  • 10
  • 6
2

use the linux command wc -l filename.txt This will output the number of lines in a file.

agf
  • 171,228
  • 44
  • 289
  • 238
majestiq
  • 545
  • 8
  • 25
1

How would you know the number of pages in a book, without counting them?
You would measure the width of a page and the width of the book and divide one by the other.

Same here, calculate the average line length from the first few lines, then do the same math with the file size...

Itay Moav -Malimovka
  • 52,579
  • 61
  • 190
  • 278
  • page size is constant. line width and bytes with utf-8 (or similar) special chars is not. – OIS Apr 15 '09 at 20:12
  • Average....It is all averages when you do such calculations (available bandwidth, available CPU resources etc), When was the last time you saw ANY progress bar who got the timing right. Besides, the solution with calculating the bytes themselves is better, I wrote this as he asked about lines. – Itay Moav -Malimovka Apr 16 '09 at 02:54