1

I wrote a PHP script which runs through a text file (actually it's a 'list' file from imdb) and stores it on my local MySQL database.

public static function updateMovies( $list ) {
    $offset = 15;               // movies.list start with movie names at line 16
    $handle = fopen($list, "r") or die("Couldn't get handle");
    if ($handle) {
        while (!feof($handle)) {
            $buffer = fgets($handle);
            if($offset!=0)
                $offset--;
            else
                if($buffer[0] != '"'){
                    $title = trim( substr( $buffer, 0, strpos( $buffer, '(' ) ) );
                    $year = intval(trim( substr( $buffer, strpos( $buffer,'(' )+1, 4 ) ));
                    Movie::create( $title, $year );
                }
        }
        fclose($handle);
    }
}

Since those list-files are up to 200MB it takes a lot of time. By Default PHP's MAX_EXECUTION_TIME is set to 30 seconds.

I set this value to 300 just to try if it works. For example, my 'movies.list' file is around 80MB and using this script for 300 seconds created around 25000 lines in my database. This doesn't work because I have not even reached the movies starting with 'B'.

I know I can set the MAX_EXECUTION_TIME to 0 (unlimited) but in the future I don't want this database to be on my localhost. I want it on my webserver and my webserver hosts MAX_EXECUTION_TIME is set to 90 as far as I know.

Any ideas how you would handle this?

Ben
  • 51,770
  • 36
  • 127
  • 149
malifa
  • 8,025
  • 2
  • 42
  • 57
  • Don't do it through http, but use a command line script – Mark Baker Apr 22 '12 at 17:42
  • Don't do individual inserts. You'll spent a LOT of time on database overhead. Use mysql's multi-insert to cut that a bit: `insert into table (...) values (...), (...), (...), etc...`. – Marc B Apr 22 '12 at 19:55
  • is this a csv or tab-delimited file? it might be a lot faster to use mysql on the cli to infile the data into a loading table and then run various updates on each column that you want to modify, or transform the data in other ways. – Darragh Enright Apr 22 '12 at 19:57
  • Thanks for your comments. I tried to tweak my script for example i didnt use prepared statements. and got a little more speed out of it by making an array of 1000 movies, pushing it to the database, clear the array and move on with the use of set_time_limit betweens these actions...it perfomed "okay" but stopped after inserting around 200.000 rows...now i just wrote a little java tool which works like a charm :) – malifa Apr 30 '12 at 21:03

2 Answers2

1

You may either: Use set_time_limit(sec) or (better) run your script from the command line through a cron entry. That way you will avoid many other non-php related timeout issues.

sivann
  • 2,083
  • 4
  • 29
  • 44
1

I don't think its a goo idea for you to load such kind of large file directly to your database ... especially when it takes so long to conclude

My Advice

Split the files to smaller chunks locally .. then on the remote server upload it to your database

Example (Documentation : http://en.wikipedia.org/wiki/Split_%28Unix%29 )

 exec('split -d -b 2048m ' . $list . ' chunks');

For a pure PHP implementation see

http://www.php.happycodings.com/File_Manipulation/code50.html

Or

define('CHUNK_SIZE', 1024*1024);
function readfile_chunked($filename, $retbytes = TRUE) {
    $buffer = '';
    $cnt =0;
    // $handle = fopen($filename, 'rb');
    $handle = fopen($filename, 'rb');
    if ($handle === false) {
      return false;
    }
    while (!feof($handle)) {
      $buffer = fread($handle, CHUNK_SIZE);
      echo $buffer;
      ob_flush();
      flush();
      if ($retbytes) {
        $cnt += strlen($buffer);
      }
    }
    $status = fclose($handle);
    if ($retbytes && $status) {
      return $cnt; // return num. bytes delivered like readfile() does.
    }
    return $status;
  }
Baba
  • 94,024
  • 28
  • 166
  • 217