0

I've got a CSV file of members that I receive once a month that contains ~6000 rows.

I'm (trying to) loop through the CSV file, check if the record already exists in the members table, and if so check whether it's the same data.

Then insert it into the pending table (with a exists flag where appropriate).

I'm using Laravel and League\CSV to read in the file that is saved in my storage folder:

class ImportController extends Controller
{
  public function import(Request $request) {

    $readDirectory = 'storage/csv/';
    $filename = $request->name;

    $stream = fopen($readDirectory.$filename, 'r');
    $reader = Reader::createFromStream($stream, 'r')->setHeaderOffset(0);
    $records = (new Statement())->process($reader);

    // Truncate the imported table prior to import
    Imported::truncate(); 

    foreach ($records as $record) {

        $email = $record['email'];

        $recordExists = $this->recordExists($email);

        if($recordExists) {
          // Compare the md5 of the recordArray and the memberArray and skip the record if thit's the same.
          $memberArray = $this->getmemberArray($recordExists);
          $recordArray = $this->getRecordArray($record);

          if($memberArray['hash'] === $recordArray['hash']) { continue; }

          $record['exists'] = TRUE;
          $this->write($record);

          continue;
        }


        else
        {
          $record['exists'] = FALSE;
          $this->write($record);
          Log::debug("missing: ".$record['URN']);

          continue;
        }
      };
    // End Foreach Loop

    return redirect()->route('upload.show');
  }



  public function recordExists($urn){
    $member = Member::where('email', 'LIKE', $email)->first();
    if ($member == null) { return false; }
    return $member;
  }

  public function getmemberArray($member) {
    $memberArray = [
      'email'       =>  $member->email,
      'first_name'  =>  $member->first_name,
      'last_name'   =>  $member->last_name,
      'age_years'   =>  $member->age_years,
      'gender'      =>  $member->gender,
      'address_1'   =>  $member->address_1,
      'address_2'   =>  $member->address_2,
      'address_3'   =>  $member->address_3,
      'town'        =>  $member->town,
      'county'      =>  $member->county,
      'postcode'    =>  $member->postcode,
      'sport_1'     =>  $member->sport_1,
      'sport_2'     =>  $member->sport_2,
    ];
    $memberArray['hash'] = md5(json_encode($memberArray));
    return $memberArray;
  }

  public function getRecordArray($record) {
    $recordArray = [
      'email'       =>  $record['email'], 
      'first_name'  =>  $record['first_name'], 
      'last_name'   =>  $record['last_name'], 
      'age_years'   =>  $record['age_years'], 
      'gender'      =>  $record['gender'],
      'address_1'   =>  $record['address_1'], 
      'address_2'   =>  $record['address_2'], 
      'address_3'   =>  $record['address_3'], 
      'town'        =>  $record['town'], 
      'county'      =>  $record['county'], 
      'postcode'    =>  $record['postcode'], 
      'sport_1'     =>  $record['sport_1'], 
      'sport_2'     =>  $record['sport_2'], 
    ];
    $recordArray['hash'] = md5(json_encode($recordArray));
    return $recordArray;
  }

  public function write($record) {

    $import = [];

    $import['email']      = $record['email'], 
    $import['first_name'] = $record['first_name'], 
    $import['last_name']  = $record['last_name'], 
    $import['age_years']  = $record['age_years'], 
    $import['gender']     = $record['gender'],
    $import['address_1']  = $record['address_1'], 
    $import['address_2']  = $record['address_2'], 
    $import['address_3']  = $record['address_3'], 
    $import['town']       = $record['town'], 
    $import['county']     = $record['county'], 
    $import['postcode']   = $record['postcode'], 
    $import['sport_1']    = $record['sport_1'], 
    $import['sport_2']    = $record['sport_2'], 
    $import['exists']     = $record['exists']

    DB::table('imported')->insert(
      $import
    );

    Log::debug($record['email']);

    return TRUE;
  }
}

But I keep getting:

Symfony \ Component \ Debug \ Exception \ FatalErrorException (E_UNKNOWN) Allowed memory size of 134217728 bytes exhausted (tried to allocate 181321056 bytes)

It works if I use a lot less rows in my CSV, but that's not an option.

I was previously writing to the DB using eloquent->save(), but changed it to DB::table()->insert to improve performance.

I've already added the following for testing purposes, but it's still breaking.

set_time_limit(0);
ini_set('max_execution_time', 100000);
ini_set('memory_limit','512m');

Am I missing something? Some kind of memory leak somewhere?

I'm guessing it's keeping the record in memory each time, so is there any way to make it forget after each row?

ALSO: Is there a way to clear this memory, so that I can edit the code and retry?

Even if I stop and re-run php artisan serve it still keeps the same error message.

n8udd
  • 657
  • 1
  • 9
  • 30
  • Apologies for the code. I'm relatively new to PHP, and especially OOP. I'm sure there are much better ways to do this than just chucking it in the controller, but for now, I just need it to work. Then I'll look at DRY, SOLID etc after. – n8udd Sep 27 '18 at 13:00
  • https://stackoverflow.com/questions/34864524 - see if information here helps. Up the allowed memory limit in php.ini directly and see what happens. The other option is to load data directly into your DB first and then use Laravel to process that information. Another thread: https://laracasts.com/discuss/channels/laravel/allowed-memory-size-of-536870912-bytes-exhausted – zedfoxus Sep 27 '18 at 13:18
  • As mentioned, I've already upped the memory limit with `ini_set`, and it's still timing out. Uploading to the database directly isn't an option, as this is for an admin of a local club. They receive the data on their members as a CSV monthly, and want to update or add new records. They don't have technical knowledge. – n8udd Sep 27 '18 at 13:21
  • 6000 rows but how large are the data? Is the memory use justified from loading the CSV or is the memory running out somewhere else? – apokryfos Sep 27 '18 at 13:25

2 Answers2

3

The problem here is that League\CSV is reading the whole CSV file into memory when you do:

$records = (new Statement())->process($reader);

You should use the chunk method of the Reader like this to only read a specific amount of rows at once:

foreach($reader->chunk(50) as $row) {
    // do whatever
}

The chunk method returns a Generator that you can iterate over. You can find this mentioned here in the documentation.

EDIT: I misread the documentation and recommended the wrong method.

You basically just have to iterate over the $reader itself:

foreach ($reader as $row) {
    print_r($row);
}

Also if you are using a mac or if your CSV was created on one you need to use the following to be able to successfully read large CSV files:

if (!ini_get('auto_detect_line_endings')) {
    ini_set('auto_detect_line_endings', '1');
}

See this part of the documentation.

VaaChar
  • 688
  • 8
  • 20
  • You should use it directly on the `$reader` since `$records` will already contain the whole CSV files content and you only want to read it in chunks. – VaaChar Sep 27 '18 at 13:25
  • Ok, so I'd want to do this: `$stream = fopen($readDirectory.$filename, 'r'); $reader = Reader::createFromStream($stream, 'r')->setHeaderOffset(0); foreach ($reader->chunk(512) as $chunk) { $records = (new Statement())->process($chunk); } // Truncate the imported table prior to import Imported::truncate(); foreach ($records as $record) {...` – n8udd Sep 27 '18 at 13:27
  • @n8udd Please checkout the edit I made. I've misread the documentation before and recommended the wrong method. I've corrected my post with a method that works and also added a tipp for OS specific problems. – VaaChar Sep 27 '18 at 16:24
-2

I get it that you are using php artisan serve to run your server. You can try deploying some form of an actual web server as you will be using that in production environment. You can try Apache, comes easily in XAMPP for windows and Linux.

You can check online on how to install Apache HTTP Server or Nginx on your operating system. These have better control and use of memory that the php default server.

Bw. Kizito
  • 68
  • 7
  • I am trying to avoid just throwing more memory at it, and instead trying to identify why it's haemorrhaging memory. – n8udd Sep 27 '18 at 13:23