4

I have data file with two lines (two lines just for my example, in real, that file can contain millions of lines) and I use SplFileObject and LimitIterator with offseting. But this combination have strange behaviour in some cases:

$offset = 0;
$file = new \SplFileObject($filePath);
$fileIterator = new \LimitIterator($file, $offset, 100);
foreach ($fileIterator as $key => $line) {
  echo $key;
}

Output is: 01

But with $offset set to 1, output is blank (foreach doesn't iterate any line).

My data file contain this:

{"generatedAt":1434665322,"numRecords":"1}
{"id":"215255","code":"NB000110"}

What I'm doing wrong?

Thanks

  • I don't know if it's important or not, but there is a missing `"` at the end of the first ligne of your data file. – Blackus Jun 19 '15 at 08:31
  • It's just a typo when I formated it as code. – Miroslav Hruška Jun 19 '15 at 10:00
  • So, this can be marked as PHP bug? Or confusing behaviour at least? I understand that correctly? – Miroslav Hruška Jun 19 '15 at 10:30
  • I certainly think it is 'confusing'. I didn't expect that behaviour. I also think it messes up the `foreach` loop processing. I think it is an 'edge' case of 'end of file' processing that is not handled correctly. The fault is with `SplFileObject` – Ryan Vincent Jun 19 '15 at 12:15

2 Answers2

1

Required:

Use SplFileObject to process a number of records from:

  • a given start record number
  • for a given number of records or until EOF.

The issue is that SplFileObject gets confused as regards the last record in the file. This prevents it working correctly in foreach loops.

This code uses the SplFileObject and 'skip records' and 'processes records'. Alas, It cannot use foreach loops.

  • Skip a number of records from the start of the file ($offset).
  • Process a given number of records or unit the end of file ($recordsToProccess)

The code:

<?php

$filePath = __DIR__ . '/Q30932555.txt';
// $filePath = __DIR__ . '/Q30932555_1.txt';

$offset = 1;
$recordsToProcess = 100;

$file = new \SplFileObject($filePath);

// skip the records
$file->seek($offset);

$recordsProcessed = 0;
while (     ($file->valid() || strlen($file->current()) > 0)
         &&  $recordsProcessed < $recordsToProcess
       ) {
    $recordsProcessed++;
    echo '<br />', 'current: ', $file->key(), ' ', $file->current();
    $file->next();
}
Ryan Vincent
  • 4,483
  • 7
  • 22
  • 31
  • It is a little 'clumsy' but it seems to do to do the job. – Ryan Vincent Jun 19 '15 at 10:39
  • Of course, this solved my problem. But I really want to use LimitIterator because it has much more readable code :( – Miroslav Hruška Jun 19 '15 at 11:59
  • @MiroslavHruška, i did try! Alas, i couldn't get anything sensible when using the 'LimitIterator'. – Ryan Vincent Jun 19 '15 at 12:04
  • I know, thanks Ryan. I will report this as PHP bug and we will see. – Miroslav Hruška Jun 19 '15 at 21:06
  • @MiroslavHruška, I tried to do a 'decorator' to fix the issues - It is a 'hard' problem to solve. The code i have provided works. It is 'clumsy'. I cannot work out how to use a `foreach` loop given what happens. I don't like the code I produced. :-/ It works but is 'fudging things' ;-/ – Ryan Vincent Jun 19 '15 at 21:18
  • The issue is that `SplFileObject->valid()` returns `false` when it reaches the last record in the list. It should return `false` after advancing to the next record after the last one. Hence the test for the `current` record having a length even though 'valid' is reporting false. ;-/ – Ryan Vincent Sep 25 '15 at 22:15
0

Reading the related PHP bug 65601 suggests adding the READ_AHEAD flag will fix this. Tested and works as you expected it to.

$offset = 0;
$file = new \SplFileObject($filePath);
$file->setFlags(SplFileObject::READ_AHEAD);
$fileIterator = new \LimitIterator($file, $offset, 100);
foreach ($fileIterator as $key => $line) {
  echo $key;
}
gaddman
  • 59
  • 7