2

I need to import a lot of data from a csv file (45 Mo) in myqsl database with Symfony. I imported League\Csv\Reader library I made a command with doctrine. It works but I is very slow. How can I accelerate this ?

I tried to :

  1. adding : $this->em->clear() after $this->em->flush();

  2. adding : //Disable SQL Logging: to avoid huge memory loss. $this->em->getConnection()->getConfiguration()->setSQLLogger(null);

.

namespace App\Command;

use Symfony\Component\Console\Command\Command;
use Symfony\Component\Console\Input\InputInterface;
use Symfony\Component\Console\Output\OutputInterface;
use Symfony\Component\Console\Style\SymfonyStyle;
use App\Entity\Developer;
use App\Entity\BadgeLabel;
use Doctrine\ORM\EntityManagerInterface;
use League\Csv\Reader;

class CsvImportCommand extends Command
{
    public function __construct(EntityManagerInterface $em){

        parent::__construct();

        $this->em = $em;
    }

    // the name of the command (the part after "bin/console")
    protected static $defaultName = 'app:import-developpers';

    protected function configure()
    {
        $this
        // the short description shown while running "php bin/console list"
        ->setDescription('Import a new developper.')

        // the full command description shown when running the command with
        // the "--help" option
        ->setHelp('This command allows you to import a develpper...')
    ;

    }

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        $io = new SymfonyStyle($input, $output);
        $io->title('Importation en cours');


        $reader = Reader::createFromPath('%kernel.root_dir%/../src/Data/developers_big.csv')
            ->setHeaderOffset(0)
        ;
        $results = $reader->getrecords();
        $io->progressStart(iterator_count($results));

        //Disable SQL Logging: to avoid huge memory loss.
        $this->em->getConnection()->getConfiguration()->setSQLLogger(null);


        foreach ($results as $row) {
            $developer = $this->em->getRepository(Developer::class)
            ->findOneBy([
                'firstName' => ($row['FIRSTNAME']),
                'lastName'=> ($row['LASTNAME'])
            ])
            ;

            if (null === $developer) {
                $developer = new developer;
                $developer
                    ->setFirstName($row['FIRSTNAME'])
                    ->setLastName($row['LASTNAME']);
                $this->em->persist($developer);
                $this->em->flush();
                $this->em->clear();             
            }

            $badgeLabel = $this->em->getRepository(BadgeLabel::class)
                ->findOneBy([
                    'name' => ($row['BADGE LABEL']),
                    'level'=> ($row['BADGE LEVEL'])
                ])
            ;

            if (null === $badgeLabel) {
                $badgeLabel = new BadgeLabel;
                $badgeLabel
                    ->setName($row['BADGE LABEL'])
                    ->setLevel($row['BADGE LEVEL']);
                $this->em->persist($badgeLabel);
                $this->em->flush();
                $this->em->clear();

            }
            $developer
                ->addBadgeLabel($badgeLabel);

            $io->progressAdvance();
        }

        $this->em->flush();
        $this->em->clear();

        $io->progressFinish();
        $io->success('Importation terminée avec succès');
    }
}

The command works put its to slow. After 15 min, only 32% was updload in my Mysql database. I Expected it in 2 minutes max

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
  • You should avoid the two "findOneBy" in your loop. Try to fetch all your developer and badgeLabel outside the foreach. Also take look at doctrine batch https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/reference/batch-processing.html – Picoss May 07 '19 at 15:41
  • 1
    You should use a profiler like XHProf or blackfire.io. This will tell which parts of your code are slowing things down and you can tackle those more directly. As a general advice, you might want to avoid Doctrine ORM and use DBAL to avoid unnecessary object hydration, e.g. fpr `$badgeLabel` and `$developer` you don't need the actual objects. You might also want to perform batch operations on multiples developers/badges at once if possible to reduce the amount of queries. Whether this is necessary very much depends on what your profiler says is the issue. – dbrumann May 07 '19 at 16:08

1 Answers1

1

Method1: (not the best)

When flush method is called, Symfony go throught all listeners. So, you could avoid to flush on each loop. You can replace each flush by this code:

if (0 === ($batchSize++ % $input->getOption('fetch'))) {
    $this->entityManager->flush();
    $this->entityManager->clear();
}

fetch option can be declared in configure method:

    const BATCH_SIZE = 1000; // As example

    /**
     * Configure the command.
     */
    protected function configure()
    {
        $this
        // the short description shown while running "php bin/console list"
        ->setDescription('Import a new developper.')

        //This option helps you to find a good value and use BATCH_SIZE constant as default
        ->addOption('fetch', 'f', InputArgument::OPTIONAL, 'Number of loop between each flush', self::BATCH_SIZE)

        // the full command description shown when running the command with
        // the "--help" option
        ->setHelp('This command allows you to import a develpper...')
    ;

Method2: More efficient

You can create a command which writes all SQL queries with update or insert in a sql file. Then, you launch a native command that read the files and execute queries.

Method3: Using DBAL As suggested in comments, youcould use DBAL to avoid unnecessary object hydration with Doctrine.

Alexandre Tranchant
  • 4,426
  • 4
  • 43
  • 70