-2

I have one problem with importing data from csv to database.

For now, my code looks like this:

public function run()
{
    $this->startProgressBar();
    foreach ($this->elements as $element) {
        $this->insertCity($element);
        $this->advanceProgressBar();
    }
    $this->finishProgressBar();
}

/**
 * @param array $item
 */
private function insertCity(array $item = [])
{
    $repository = $this->getDoctrine()->getRepository(Commune::class);
    $commune = $repository->findOneByTerc($this->getTerc($item));

    $district = $item['uid'] == $item['district_uid'] ? null : $item['district_uid'];

    $city = new City();
    $city->setName($item['name']);
    $city->setCommuneId($commune->getId());
    $city->setDistrictUid($district);
    $city->setType($item['city_type']);
    $city->setUid($item['uid']);

    $this->getDoctrine()->getManager()->persist($city);
    $this->getDoctrine()->getManager()->flush();
}

Every one row I make select and insert. My csv file has 100k rows. In 1 hour, this code imports only 10k rows :(

Any ideas, how can I optimize it?

Filip.

Filip
  • 143
  • 12
  • 1
    Do as little as possible within a loop, don't fetch the repository and entity manager inside the loop, and do not flush every time. – JimL Aug 16 '17 at 07:41
  • 1
    I'd consider just using a simple dql query and executing it. You should prepare the query outside the loop and execute inside the loop. And not flush the changes too often – JimL Aug 16 '17 at 07:45
  • @JimL what do you mean about preparing query outside the loop? Can you provide an example? – Filip Aug 16 '17 at 07:49
  • Try improving the current code first and see if it will work better. If you change to don't flush every iteration doctrine will use prepared statements already, not sure if changing to dql will change that much. – JimL Aug 16 '17 at 07:52
  • Could you add `getTerc` as well? – JimL Aug 16 '17 at 07:52
  • @JimL getTerc is simple: protected function getTerc(array $item = []) : string { $province = $item['province']; $county = $item['county'] == "" ? "00" : $item['county']; $commune = $item['commune'] == "" ? "00" : $item['commune']; $type = $item['type'] == "" ? "0" : $item['type']; return $province . $county . $commune . $type; } – Filip Aug 16 '17 at 07:54
  • 1
    You should persist each iteration, but only flush (update the database) when you've queued up (quite) a few entities to persist. – JimL Aug 16 '17 at 07:55
  • Thanks @JimL, after some changes I see huge boost in performance! :) – Filip Aug 16 '17 at 08:04
  • 1
    Great :) Always remember to try to do as little as possible inside loops, remember you're running that code 100k times ^^ it gets even worse if you call a function inside a loop that loops over some stuff, etc. Keep it simple :) – JimL Aug 16 '17 at 08:05

1 Answers1

2

Use SQL.

From Doctrine website, part about mass procesing:

An ORM tool is not primarily well-suited for mass inserts, updates or deletions.

svgrafov
  • 1,970
  • 4
  • 16
  • 32
  • Nice downvote on the only correct answer. Care to comment with anything useful? – svgrafov Aug 16 '17 at 08:55
  • Not sure as it wasn't me, but I'm guessing it's because you simply said 'Use SQL' with no explanation as to how or why, and then you linked to documentation that actually showed a way to actually do bulk inserts and improve the speed while still using Doctrine, but skipped over that and instead simply put that quote. I'd probably edit your answer to be a little more specific. – Jason Roman Aug 16 '17 at 13:51