2

For one off my projects I need to import a very huge text file ( ~ 950MB ). I'm using Symfony2 & Doctrine 2 for my project.

My problem is that I get errors like:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes)

The error even occurs if I increase the memory limit to 1GB.

I tried to analyze the problem by using XDebug and KCacheGrind ( as part of PHPEdit ), but I don't really understand the values :(

I'am looking for a tool or a method (Quick & Simple due to the fact that I don't have much time) to find out why memory is allocated and not freed again.

Edit

To clear some things up here is my code:

$handle = fopen($geonameBasePath . 'allCountries.txt','r');

        $i = 0;
        $batchSize = 100;

        if($handle) {
            while (($buffer = fgets($handle,16384)) !== false) {

                if( $buffer[0] == '#') //skip comments
                    continue;
                //split parts
                $parts = explode("\t",$buffer);


                if( $parts[6] != 'P')
                    continue;

                if( $i%$batchSize == 0 )    {
                    echo 'Flush & Clear' . PHP_EOL;
                    $em->flush();
                    $em->clear();
                }

                $entity = $em->getRepository('MyApplicationBundle:City')->findOneByGeonameId( $parts[0] );
                if( $entity !== null)   {
                    $i++;
                    continue;
                }

                //create city object
                $city = new City();

                $city->setGeonameId( $parts[0] );
                $city->setName( $parts[1] );
                $city->setInternationalName( $parts[2] );
                $city->setLatitude($parts[4] );
                $city->setLongitude( $parts[5] );
                $city->setCountry( $em->getRepository('MyApplicationBundle:Country')->findOneByIsoCode( $parts[8] ) );

                $em->persist($city);

                unset($city);
                unset($entity);
                unset($parts);
                unset($buffer);

                echo $i . PHP_EOL;


                $i++;
            }
        }

        fclose($handle);

Things I have tried, but nothing helped:

  1. Adding second parameter to fgets
  2. Increasing memory_limit
  3. Unsetting vars
Mikael Engver
  • 4,634
  • 4
  • 46
  • 53
Frido
  • 77
  • 1
  • 2
  • 8
  • We used to set memory limit to 20GB for some scripts when we knew that there can be temporary large memory usages (such as downloading 2GB file or so). :) – Vyktor Jan 29 '12 at 16:03
  • 1
    That is just crazy. Not everybody has 20GB memory. Seriously... – i.am.michiel Jan 29 '12 at 16:50
  • I've watched the php process in the taskmanager, the memory usage keeps rising. I had this problem with C++ or Objective-C because i forgot a _delete_ or a _release_ but never with php – Frido Jan 29 '12 at 19:04
  • Does this happen even if you cut out the ORM? – Ocramius Jan 29 '12 at 23:48

3 Answers3

5

Increasing memory limit is not going to be enough. When importing files like that, you buffer the reading.

$f = fopen('yourfile');
while ($data = fread($f, '4096') != 0) {
    // Do your stuff using the read $data
}
fclose($f);

Update :

When working with an ORM, you have to understand that nothing is actually inserted in the database until the flush call. Meaning all those objects are stored by the ORM tagged as "to be inserted". Only when the flush call is made, the ORM will check the collection and start inserting.

Solution 1 : Flush often. And clear.

Solution 2 : Don't use the ORM. Go for plain SQL command. They will take up far less memory than the object + ORM solution.

i.am.michiel
  • 10,281
  • 7
  • 50
  • 86
  • I'm using _fgets_ isn't that the same? – Frido Jan 29 '12 at 20:04
  • Non really, when checking http://php.net you can see they are not doing the same. `fread` is simply reading bytes from the file. `fgets` is for reading lines, reading at a given position. – i.am.michiel Jan 29 '12 at 21:16
  • Sorry when I said that they are the same, I meant they both read sequentially from a file.. – Frido Jan 29 '12 at 21:42
0

33554432 are 32MB

change memory limit in php.ini for example 75MB

memory_limit = 75M

and restart server

ZiTAL
  • 3,466
  • 8
  • 35
  • 50
  • "The error even occurs if I increase the memory limit to 1GB." – JJJ Jan 29 '12 at 16:34
  • 1
    Theres no saying that a 950MB file will still be 950MB when its put into PHP. For all you know, the actual memory count could be twice as much. – Adam Fowler Jan 29 '12 at 16:40
0

Instead of simply reading the file, you should read the file line by line. Every time you do read the one line you should process your data. Do NOT try to fit EVERYTHING in memory. You will fail. The reason for that is that while you can put the TEXT file in ram, you will not be able to also have the data as php objects/variables/whathaveyou at the same time, since php by itself needs much larger amounts of memory for each of them.

What I instead suggest is a) read a new line, b) parse the data in the line c) create the new object to store in the database d) goto step a, by unset(ting) the old object first or reusing it's memory

ktolis
  • 443
  • 1
  • 3
  • 14