This is something weird I can't really understand, I'm working in PostGreSql9.2 ...
I have this database:
movies (id, title, votes)
infos (id, movie_id, info_type, value)
I want to update movies.votes with infos.value, joining on movies.id = infos.movie_if and only where info_type = 100 (which is the type for votes..)
I tried 2 different queries:
update movies
set votes = cast(i.value as integer)
from movies m inner join infos i on m.id = i.movie_id
where i.info_type = 100
which (using explain) predicts a running time of about 11 million seconds (too much!)
second try:
update movies m
set votes = cast(
(
select value
from infos i
where i.info_type = 100 and i.movie_id = m.id
limit 1
) AS integer);
this one whould be "only" 20 thousands seconds.. still far too much
I don't really know how the query plan work, so I tries to do this with a ruby script (using active_record)... which is:
Info.find_in_batches(:conditions => "info_type = 100") do |group|
group.each{ |info|
movie = Movie.find(info.movie_id)
movie.votes = info.value.to_i
movie.save
}
end
For those of you who don't read ruby, this query simply loops thru all infos that meet the info_type = 100 condition, then for each one it searches the corresponding movie and updates it..
And it was very fast! just a few minutes, and with all the ruby/orm overhead!!
Now, why?? Know that movies is about 600k records, but only 200k (a third) have an info record with the number of votes. Still this doesn't explain what is happening.