30

fopen is failing when I try to read in a very moderately sized file in PHP. A 6 meg file makes it choke, though smaller files around 100k are just fine. i've read that it is sometimes necessary to recompile PHP with the -D_FILE_OFFSET_BITS=64 flag in order to read files over 20 gigs or something ridiculous, but shouldn't I have no problems with a 6 meg file? Eventually we'll want to read in files that are around 100 megs, and it would be nice be able to open them and then read through them line by line with fgets as I'm able to do with smaller files.

What are your tricks/solutions for reading and doing operations on very large files in PHP?

Update: Here's an example of a simple codeblock that fails on my 6 meg file - PHP doesn't seem to throw an error, it just returns false. Maybe I'm doing something extremely dumb?

$rawfile = "mediumfile.csv";

if($file = fopen($rawfile, "r")){  
  fclose($file);
} else {
  echo "fail!";
}

Another update: Thanks all for your help, it did turn out to be something incredibly dumb - a permissions issue. My small file inexplicably had read permissions when the larger file didn't. Doh!

Alive to die - Anant
  • 70,531
  • 10
  • 51
  • 98
  • Are you just trying to pass the file thru? ie. Download? Or are you actually parsing the data in the files for some purpose? Thx. – DreamWerx Oct 02 '08 at 13:21
  • it should not fail without generating a warning / error. Please turn all errors on with error_reporting(E_ALL) and make sure display_errors are set to on to show in your browser, or check your webservers error log. – Philip Reynolds Oct 02 '08 at 13:54

8 Answers8

57

Are you sure that it's fopen that's failing and not your script's timeout setting? The default is usually around 30 seconds or so, and if your file is taking longer than that to read in, it may be tripping that up.

Another thing to consider may be the memory limit on your script - reading the file into an array may trip over this, so check your error log for memory warnings.

If neither of the above are your problem, you might look into using fgets to read the file in line-by-line, processing as you go.

$handle = fopen("/tmp/uploadfile.txt", "r") or die("Couldn't get handle");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle, 4096);
        // Process buffer here..
    }
    fclose($handle);
}

Edit

PHP doesn't seem to throw an error, it just returns false.

Is the path to $rawfile correct relative to where the script is running? Perhaps try setting an absolute path here for the filename.

ConroyP
  • 40,958
  • 16
  • 80
  • 86
  • 4
    It is only possible solution how to open really big files. I am processing by this solution 1.5GB file without any problem. All other solutions like file_get_contents of file will read whole file to memory. This approach is processing line by line. – StanleyD Aug 22 '13 at 06:06
  • Why 4096 means one line? – Phoenix Feb 03 '15 at 08:54
  • @Phoenix 4096 means, read at most 4096 - 1 bytes iff no line breaks are encountered. Check the manual. – a3f Feb 08 '15 at 23:34
  • 3
    For me `stream_get_line` is more faster than `fgets` check out this comparative https://gist.github.com/joseluisq/6ee3876dc64561ffa14b – joseluisq Mar 11 '16 at 16:04
13

Did 2 tests with a 1.3GB file and a 9.5GB File.

1.3 GB

Using fopen()

This process used 15555 ms for its computations.

It spent 169 ms in system calls.

Using file()

This process used 6983 ms for its computations.

It spent 4469 ms in system calls.

9.5 GB

Using fopen()

This process used 113559 ms for its computations.

It spent 2532 ms in system calls.

Using file()

This process used 8221 ms for its computations.

It spent 7998 ms in system calls.

Seems file() is faster.

Wolverine
  • 1,712
  • 1
  • 15
  • 18
Al-Punk
  • 3,531
  • 6
  • 38
  • 56
9

• The fgets() function is fine until the text files passed 20 MBytes and the parsing speed is greatly reduced.

• The file_ get_contents() function give good results until 40 MBytes and acceptable results until 100 MBytes, but file_get_contents() loads the entire file into memory, so it's not scalabile.

• The file() function is disastrous with large files of text because this function creates an array containing each line of text, thus this array is stored in memory and the memory used is even larger.
Actually, a 200 MB file I could only manage to parse with memory_limit set at 2 GB which was inappropriate for the 1+ GB files I intended to parse.

When you have to parse files larger than 1 GB and the parsing time exceeded 15 seconds and you want to avoid to load the entire file into memory, you have to find another way.

My solution was to parse data in arbitrary small chunks. The code is:

$filesize = get_file_size($file);
$fp = @fopen($file, "r");
$chunk_size = (1<<24); // 16MB arbitrary
$position = 0;

// if handle $fp to file was created, go ahead
if ($fp) {
   while(!feof($fp)){
      // move pointer to $position in file
      fseek($fp, $position);

      // take a slice of $chunk_size bytes
      $chunk = fread($fp,$chunk_size);

      // searching the end of last full text line (or get remaining chunk)
      if ( !($last_lf_pos = strrpos($chunk, "\n")) ) $last_lf_pos = mb_strlen($chunk);

      // $buffer will contain full lines of text
      // starting from $position to $last_lf_pos
      $buffer = mb_substr($chunk,0,$last_lf_pos);
      
      ////////////////////////////////////////////////////
      //// ... DO SOMETHING WITH THIS BUFFER HERE ... ////
      ////////////////////////////////////////////////////

      // Move $position
      $position += $last_lf_pos;

      // if remaining is less than $chunk_size, make $chunk_size equal remaining
      if(($position+$chunk_size) > $filesize) $chunk_size = $filesize-$position;
      $buffer = NULL;
   }
   fclose($fp);
}

The memory used is only the $chunk_size and the speed is slightly less than the one obtained with file_ get_contents(). I think PHP Group should use my approach in order to optimize it's parsing functions.

*) Find the get_file_size() function here.

Tinel Barb
  • 121
  • 1
  • 4
  • 1
    This is incomplete, fread moves the file pointer. By not reseting the position you lost the first chunk, big one too .. 16mb. Test first – ion Mar 03 '19 at 22:27
  • Thanks, Ionut, for your useful observation. Code updated. – Tinel Barb Apr 16 '19 at 07:46
  • I tried this with a large file (ca 256MB), but the loop seems to get stuck in the last part of the buffer. The buffer seems to contain only 1 line in the last < 16MB part, so it reads out every line sigular, and takes forever to finish. – GerritElbrink Apr 08 '21 at 11:01
1

Well you could try to use the readfile function if you just want to output the file.

If this is not the case - maybe you should think about the design of the application, why do you want to open such large files on web requests?

Fionn
  • 10,975
  • 11
  • 54
  • 84
  • We've got to automate adding large sets of data, so large CSV files can be uploaded by the user and are parsed and integrated into the database by the application. I'd love other suggestions for approach if you think reading and parsing uploaded files with PHP isn't the best way to go. –  Oct 02 '08 at 13:22
  • I wouldn't think PHP would have a problem with 6MB csv files? Seems like a small enough file for it to handle. As per the comments above, please post the exact error/and or code. Could be memory error your hitting? Or a max_execution_time? We need more info to help. – DreamWerx Oct 02 '08 at 13:25
1

I used fopen to open video files for streaming, using a php script as a video streaming server, and I had no problem with files of size more than 50/60 MB.

Enrico Murru
  • 2,313
  • 4
  • 21
  • 24
0

for me, fopen() has been very slow with files over 1mb, file() is much faster.

Just trying to read lines 100 at a time and create batch inserts, fopen() takes 37 seconds vs file() takes 4 seconds. Must be that string->array step built into file()

I'd try all of the file handling options to see which will work best in your application.

Muhammad Hassaan
  • 7,296
  • 6
  • 30
  • 50
RightClick
  • 1,090
  • 7
  • 12
-1

Have you tried file() ?

http://is2.php.net/manual/en/function.file.php

Or file_ get_contents()

http://is2.php.net/manual/en/function.file-get-contents.php

Ólafur Waage
  • 68,817
  • 22
  • 142
  • 198
  • Be careful with file_get_contents() for large files. Although 6 megs should be fine, streaming is much better since it does not read the entire file into memory first. – Dustin Graham Apr 10 '14 at 18:04
-1

If the problem is caused by hitting the memory limit, you can try setting it a higher value (this could work or not depending on php's configuration).

this sets the memory limit to 12 Mb

ini\_set("memory_limit","12M");
NullPoiиteя
  • 56,591
  • 22
  • 125
  • 143
Juan Pablo Califano
  • 12,213
  • 5
  • 29
  • 42
  • 3
    Note: While this may help, it only postpones the problem: once a 15 MB file comes in, the problem comes back. (If your files won't ever go over a certain limit, this may make the problem go away.) – Piskvor left the building Sep 09 '10 at 08:57