0

I have a big text file (144000 line) that have a custom format like following:

xxx
XXXfield1XXX
value1
xxx
xxx
XXXfield2XXX
value2
xxx
xxx
XXXfield3XXX
value3
xxx

But there is a syntax-error (perhaps more) in the file (Because total line number of file is not dividable to four)

How can I find the line number of error using just RegExp?

Handsome Nerd
  • 17,114
  • 22
  • 95
  • 173

2 Answers2

1

Detecting Error is easy .. Imagine

log.txt

xxx
XXXfield1XXX
value1
xxx
xxx
XXXfield2XXX <----- Note that this field has no value 
xxx
xxx
XXXfield3XXX
value3
xxx
value3
xxx

Simple Scanner

$fileSource = "log.txt";
$tagRow = "xxx";
$tagField = "XXX";

$rh = fopen($fileSource, 'rb');
if (!$rh) {
    trigger_error("Can't Start File Resource");
}
echo "<pre>";
$i = 0;
while ( ! feof($rh) ) {
    $l = trim(fgets($rh));
    if ((($i % 4) == 0 || ($i % 4) == 3) && $l != $tagRow) {
        echo "Row tag error line $i \n";
        break;
    }

    if (($i % 4) == 1 && strpos($l, $tagField) !== 0) {
        echo "Missing Field tag line $i  \n";
        break;
    }

    if (($i % 4) == 2 && (strpos($l, $tagRow) === 0 || strpos($l, $tagRow) === 0)) {
        echo "Fixed Missing Value line $i \n";
        break;
    }
    $i ++;
}
fclose($rh);

Output

  Fixed Missing Value line 6 
Baba
  • 94,024
  • 28
  • 166
  • 217
0

Write a program to read the file, one line at a time, and parse it. If a line isn't consistent with the format, then report the error and exit.

As you read each line, keep track of the line number. Base your tests on the line number using the % operator and a switch statement.

switch ($linecount % 4) {
    case 0:
        $error = (some condition that evaluates the line);
        break;
    case 1:
        $error = (some condition that evaluates the line);
        break;
    case 2:
        $error = (some condition that evaluates the line);
        break;
    case 3:
        $error = (some condition that evaluates the line);
        break;
}
if ($error) {
    echo 'Error on line ' . $linenum . ': ' . $line;
    exit;
}
tomlogic
  • 11,489
  • 3
  • 33
  • 59