I'm normalizing a database and need to run ~630k updates. Here is the basic structure of my tables:
domains
- id
- name
stats
- domain
- domain_id
Before, the database didnt have domains table and the domains were stored in multiple table, sometimes as lists (JSON text). I migrated every domain to the domains
table and now I need to establish the relation to the stats
table which has the domain
column. I added the domain_id
column and tried to somehow update it to match the id
of the domain in the domains
table. The stats
table has over 23m rows with ~630k unique domains (the stats are hourly). I tried running a foreach but it takes about 2 second for each domain, adding to about 14 days to run all of them.
Here is my code so far:
First I find all the domains from the stats
table that are missing in the domains
table and save them in the domains
table.
$statDomains = Stat::select('domain')->groupBy('domain')->lists('domain');
$domains = [];
foreach(array_chunk($statDomains , 1000) as $domains1k){
$domains = array_merge($domains, Domain::whereIn('name', $domains1k)->lists('name'));
}
$missingDomains = [];
foreach(array_diff($statDomains , $domains) as $missingDomain){
$missingDomains[] = ['name' => $missingDomain];
}
if(!empty($missingDomains)){
Domain::insert($missingDomains);
}
Next I get all the domains from the domains
table that are present in the stats
table and update all rows from the stats
table with that domain.
$domains = [];
foreach(array_chunk($statDomains, 1000) as $domains1k){
$domains +=Domain::whereIn('name', $domains1k)->lists('name', 'id');
}
foreach($domains as $key => $domain){
Stat::where('domain', $domain)->update(['domain_id' => $key]);
}
I would appreciate something in eloquent, query builder or just raw SQL, that will do the updating faster (up to two hours?). I did some googling and found similar question but was unable to apply to my case.
EDIT
I'm running the suggested solutions right now. In the meantime I found that two more parts of my migration are taking about 50 minutes. In the first one, I have a table domain_lists
. It has a text column domains
with JSON encoded domains. I'm moving those domain to the domain
table and creating records on the domain_lists_domains_map
table. Heres the code:
foreach(DomainList::all() as $domainList){
$attach = [];
$domains = json_decode($domainList->domains, true);
foreach($domains as $domain){
$model = Domain::where('name', '=', $domain)->first();
if(is_null($model) && !is_null($domain)){
$model = new Domain();
$model->name = $domain;
$model->save();
}
if(!is_null($model)){
$attach[] = $model->id;
}
}
if(!empty($attach)){
foreach(array_chunk(array_unique($attach), 1000) as $attach1k){
$domainList->domains()->attach($attach1k);
}
}
}
I already noticed, that I should probably find all the unique domain and insert them to the domains table first, but gives the solution to the previous question I feel that there might be a much better way of doing all this in just raw SQL. The second part is very similar and I can probably figure out how to solve it looking at the code of the first one. The table is categories and it also has a domains text column with JSON encoded domains. Any help greatly appreciated.
EDIT 2
Heres the query I ran to copy the existing table to a new one with the domain_id
column populated:
CREATE TABLE "stats_new" AS SELECT
"s"."domain",
"d"."id" AS "domain_id"
FROM
"stats" "s"
JOIN "domains" "d" ON ("s"."domain" = "d"."name")