0

I have a web app written in PHP that needs to parse data files. It uses preg_replace to get rid of any column headings etc. at the top of the file and then loops through the rest of the file line by line, using preg_match to pull out the relevant values on each line.

Here is the format of the file:

Column heading 1  Column heading 2  Column heading 3 Column heading 4
       0.000000000E+0000     0.000000000E+0000     0.000000000E+0000     0.000000000E+0000  
       0.000000000E+0000     0.000000000E+0000     0.000000000E+0000     0.000000000E+0000  
       0.000000000E+0000     0.000000000E+0000     0.000000000E+0000     0.000000000E+0000  
       0.000000000E+0000     0.000000000E+0000     0.000000000E+0000     0.000000000E+0000  
       0.000000000E+0000     0.000000000E+0000     0.000000000E+0000     0.000000000E+0000  

There could be up to 10,000 rows of data in the file. I have just had a customer ask if I can only match the last row of data, rather than taking data from every row.

Therefore I need a regex to remove the column headings and every line of data except the last one. Here is the code I am using:

$startsWith = "/^Column heading 1  Column heading 2  Column heading 3 Column heading 4\r\n(   [0-9]{1}\.[0-9]{9}E[\+-][0-9]{4}     [0-9]{1}\.[0-9]{9}E[\+-][0-9]{4}     [0-9]{1}\.[0-9]{9}E[\+-][0-9]{4}     [0-9]{1}\.[0-9]{9}E[\+-][0-9]{4}  \r\n(?!$))*/s";

$str = preg_replace($startsWith,'',$str);

This should leave me with only the last line of data. This works absolutely fine if I run the script through command-line PHP. However if I run it through the browser I get no response from the server - just a blank page.

I have seen this question already: RegExp in preg_match function returning browser error

But it doesn't really help as lowering the pcre.recursion_limit just causes the regex not to work at all, and come back with a PREG_RECURSION_LIMIT_ERROR error.

Is there a more efficient way, using regex, to trim everything except the last line of data in a large file? Or some settings I can tweak to make it work through Apache?

Update

Thanks for everyone's suggestions but, because of the way the system's built, I NEED to use regex. For parsing this particular file it's not ideal I know, but for other file types it's the only way. The system is built to parse many very odd file types and regex was the only way of achieving this. The regex I supplied works when running through PHP CLI, but not through a web page - is there a more efficient regex or some settings I can change to make it work through Apache?

Community
  • 1
  • 1
user1578653
  • 4,888
  • 16
  • 46
  • 74
  • Why don't you read the file in to an array with `file()`, and then just get the last element of the array? Regexp seems like really the wrong tool for this. – Barmar Jul 30 '14 at 09:31
  • Or read the file line by line in a loop, and save the last line in a variable. – Barmar Jul 30 '14 at 09:32
  • Unfortunately this is the way the program is designed, because each data 'chunk' might not be one line. A chunk could be 2 lines, 4 lines or not even the same within a file. The system is trying to cope with many many different file formats, and regex seemed like the most flexible answer. – user1578653 Jul 30 '14 at 09:34

2 Answers2

1

Split the string on newlines, and get the last line:

$lines = explode("\n", $str);
$last_line = array_pop($lines);
Barmar
  • 741,623
  • 53
  • 500
  • 612
0

I'd suggest using fseek, because loading the 10000 lines file completly first is really cpu and time intensive:

   $fp = fopen("file.txt", "r"); 
   $pos = -1; 
   $t = " "; 
   while ($t != "\n") { 
         fseek($fp, $pos, SEEK_END); 
         $t = fgetc($fp); 
         $pos = $pos - 1; 
   } 
   $t = fgets($fp); 
   fclose($fp); 
   return $t; 

Source: http://forums.devshed.com/php-development-5/php-quick-way-to-read-last-line-156010.html

Dennis Stücken
  • 1,296
  • 9
  • 10