Optimizing propel data insertion for a very long data file

Question

I'm reading a very long text file where each line is composed of an ID, groupID, and additional data. Each ID can be associated with many groupIDs (lines 1,2,3), and each ID-groupID combo can be associated with many data (lines 2,3).

JWOFJ903JCKDF8O | groupID-22 | some data 
JWOFJ903JCKDF8O | groupID-33 | same ID as above, but different groupID and data
JWOFJ903JCKDF8O | groupID-33 | same ID and groupID as above, but different data 
... 
DF8#CKJ90JJ3WOF | groupID-22 | some data 
...

I'm moving this data to a database, so I have a table for the IDs (no ID duplication), a table for the ID and groupID (no ID-groupID duplication), and a table for the data, which references the ID-groupID table.

So to insert 1 line to the database, I first check that this ID doesn't exist in the ID table, then insert it. Then I check that this ID-groupID combo doesn't exist in the ID-groupID table, then insert it. And finally, insert the data under this ID-groupID id.

does this $id exist in the IDs table
if($id doesn't exist in the IDs table){
  insert a new ID()
  save()
}

does this ID-groupID combo exist in the ID-groupID table
if(doesn't exist){
  create new id-groupid combo
}

does this data exist under the third table in association with this id-groupid combo
if(doesn't exist){
  insert it
}

The problem is that because the file is very large (100,000 lines), the process takes hours to complete. Is there something I can do to optimize my propel queries? or improve the design of the database?

score 1 · Answer 1 · edited Jul 09 '12 at 17:58

1

You should use PDO. PDO gives you some performance and security improvements. Also, PDO and MySQLi modules support transactions, which is most likely what you are looking for.

Transactions, if you are only doing INSERT/UPDATE/SELECTS, will be cached and executed at one time, rather than each time it is called. This is perfect for a scenario when you have a loop.

Example:

$pdo = new PDO(...);
$pdo->beginTransaction();

foreach($array as $ar){
    $pdo->query('INSERT INTO...');         
}

$pdo->commit();

edited Jul 09 '12 at 17:58

j0k

22,600
28
79
90

answered Jul 09 '12 at 16:37

Mike Mackintosh

13,917
6
60
87

I used to use PDO, then I moved to the propel orm a couple of years ago. but +1 for the help. – sameold Jul 09 '12 at 16:45
YOu pasted it all over that you used propel, totally missed it. :D – Mike Mackintosh Jul 09 '12 at 16:51

score 1 · Answer 2 · edited May 23 '17 at 12:27

1

Have a look at his answer.

It points to an article about Propel optimizaton for a massive insertion. It's in french but easily to understand.

edited May 23 '17 at 12:27

Community

1
1

answered Jul 09 '12 at 19:44

j0k

22,600
28
79
90

Thanks for the link. The article suggests using `Propel::disableInstancePooling();` and says it helps with memory consumption. This is very useful, but are you sure this addresses my issue? because I'm not running into memory problems yet (I've set my memory limit to be high enough). My concern is how can I optimize my queries: I do 3 checks and up to 3 inserts for each line in the source file. – sameold Jul 09 '12 at 22:35

Optimizing propel data insertion for a very long data file

2 Answers2