4

I have set of text files which have very large file size. I dont want to read the whole file.I am only interested in block which starts with ** DATA IMP and ends on ** DATA END.Any data in between this block is important for me to use.

However this block may appear at start of file , at the end or in between the text file. I want this reading process to be fast.Lets say if it is at start of text file it should be as quick as linux -head which takes very lesser time for large files.

What is the optimum way to read these large files so Once I get this block, i dont have to read the file till end?

File Content Sample: (600 MB or greater)

Dummy text
Dummy text
Dummy text
Dummy text
** DATA IMP
** d
** e
** f
** g
** DATA END
Dummy text
Dummy text
Dummy text
AND SO ON ...

EDIT: *OK.I am assuming the data is at the top of file since i dont have other option.* File Content Sample: (600 MB or greater)

** DATA IMP
** d
** e
** f
** g
** DATA END
Dummy text
Dummy text
Dummy text
Dummy text
Dummy text
Dummy text
AND SO ON ...
django
  • 2,809
  • 5
  • 47
  • 80
  • 1
    and I want the winning lottery numbers but that does not make it possible. You will need to scan the whole file until you reach DATA END. Possible methods are reading each line or grepping. Head will not help you as it will only show the top x lines of a file irrespective of content – Anigel Aug 02 '13 at 06:23
  • Ok. Any php solutions which may be fastest is welcome.I am on windows so cant use grep. – django Aug 02 '13 at 06:28
  • increase the php server execution time – Arun Kumar Aug 02 '13 at 06:29
  • In Windows there's a command called "findstr", have you tried it? It's some equivalent to "grep". – Alejandro Iván Aug 02 '13 at 06:30
  • it seems grep is available for windows. http://gnuwin32.sourceforge.net/packages/grep.htm – django Aug 02 '13 at 06:44
  • just install cygwin so you get grep and the rest of text-utils. – imel96 Aug 02 '13 at 06:56

5 Answers5

2

Use the SplFileObject class.

First use SplFileObject::fgets to:

Returns a string containing the next line from the file, or FALSE on error.

Something like this

$file = new SplFileObject("file.txt");
while (!$file->eof()) {
   $line = $file->fgets();
   if ($line === 'needle') break;
}

Then you can use the $counter variable to as a reference to which line contains your needle. After that its pretty trivial to get what information you want. Want to retrieve that line? Or the whole document after it? Or before it? Go here and use the SplFileObject static functions to do whatever else you need to do.

Seph
  • 1,084
  • 13
  • 22
  • +1 because SplFileObject actually has a method to get a specific line: http://php.net/manual/en/splfileobject.seek.php – AVProgrammer Dec 01 '15 at 17:02
0

As long as the file is not in an index, a database or something similar you have to go through the whole file until you find ** DATA IMP.

Another option would be if that text is on a certain position which yours is not.

If you want to extract the text:

$file = new SplFileObject("file.txt");

$lines = array();

while (!$file->eof())
{
  $line = $file->fgets();

  if ($line === '** DATA IMP')
    break;
}

$line = $file->fgets();

while($line != '** DATA END')
{
   $lines[] = $line;
   $line = $file->fgets();
}
Jimmy T.
  • 4,033
  • 2
  • 22
  • 38
0

Have you tried something like:

<?php
    $raw = shell_exec('grep \'\*\*\' /path/to/file');
    var_dump($raw);
?>

Sorry, just noticed in a comment that you are on Windows. I guess there must a Windows version of grep, it may be worth looking into that.

Tigger
  • 8,980
  • 5
  • 36
  • 40
0

I Think I will have to rely on external tools like grep linux (for windows gnu32 ) for my specific needs as according to my understanding has better performance over php.

Kindly add comments if you disagree.

django
  • 2,809
  • 5
  • 47
  • 80
  • It could work if the file is broken by new lines. If there's no newline that it will just dump the whole file. Btw, I agree, grep is faster. – imel96 Aug 02 '13 at 06:53
  • I have newlines in data – django Aug 02 '13 at 06:58
  • Have a look at this SO thread then: http://stackoverflow.com/questions/87350/what-are-good-grep-tools-for-windows – Tigger Aug 02 '13 at 08:18
0

A Windows equivelant to grep is findstr:

Searches for strings in files.

findstr