I'm reading a very long text file where each line is composed of an ID, groupID, and additional data. Each ID can be associated with many groupIDs (lines 1,2,3), and each ID-groupID combo can be associated with many data (lines 2,3).
JWOFJ903JCKDF8O | groupID-22 | some data
JWOFJ903JCKDF8O | groupID-33 | same ID as above, but different groupID and data
JWOFJ903JCKDF8O | groupID-33 | same ID and groupID as above, but different data
...
DF8#CKJ90JJ3WOF | groupID-22 | some data
...
I'm moving this data to a database, so I have a table for the IDs (no ID duplication), a table for the ID and groupID (no ID-groupID duplication), and a table for the data, which references the ID-groupID table.
So to insert 1 line to the database, I first check that this ID doesn't exist in the ID table, then insert it. Then I check that this ID-groupID combo doesn't exist in the ID-groupID table, then insert it. And finally, insert the data under this ID-groupID id.
does this $id exist in the IDs table
if($id doesn't exist in the IDs table){
insert a new ID()
save()
}
does this ID-groupID combo exist in the ID-groupID table
if(doesn't exist){
create new id-groupid combo
}
does this data exist under the third table in association with this id-groupid combo
if(doesn't exist){
insert it
}
The problem is that because the file is very large (100,000 lines), the process takes hours to complete. Is there something I can do to optimize my propel queries? or improve the design of the database?