There's a bunch of performance problems here if you need to do this millions of times.
You're preparing the same SQL statement over and over again, millions of times. It would work better to prepare it once and execute it millions of times.
You're disconnecting from the database on every function call after a single query. That means you need to reconnect each time and any cached information is thrown away. Don't do that, leave it connected.
You're committing after each row. This will slow things down. Instead, commit after doing a batch.
The select + update or insert can probably be done as a single upsert.
That you're inserting so much into a temp table is probably a performance issue.
If the table has too many indexes that can slow inserts. Sometimes it's best to drop indexes, do a big batch update, and recreate them.
Because you're putting values directly into your SQL, your SQL is open to a SQL injection attack.
Instead...
- Use prepared statements and bind parameters
- Leave the database connected
- Do updates in bulk
- Only commit at the end of a run of updates
- Do all the math in the
UPDATE
rather then SELECT + math + UPDATE
.
- Use an "UPSERT" instead of
SELECT
then UPDATE
or INSERT
First off, prepared statements. These let MySQL compile the statement once and then reuse it. The idea is you write a statement with placeholders for the values.
select id, position, impressions, clicks, ctr
from temp
where profile_id=%s and
keyword=%s and
landing_page=%s
Then you execute that with the values as arguments, not as part of the string.
self.cursor.execute(
'select id, position, impressions, clicks, ctr from temp where profile_id=%s and keyword=%s and landing_page=%s',
(profile_id, keyword, landing_page)
)
This allows the database to cache the prepared statement and not have to recompile it each time. It also avoids a SQL injection attack where a clever attacker can craft a value that is actually more SQL like " MORE SQL HERE "
. It is a very, very, very common security hole.
Note, you might need to use MySQL's own Python database library to get true prepared statements. Don't worry about it too much, using prepared statements is not your biggest performance problem.
Next, what you're basically doing is adding to an existing row, or if there is no existing row, inserting a new one. This can be done more efficiently in a single statement with an UPSERT
, a combined INSERT
and UPDATE
. MySQL has it as INSERT ... ON DUPLICATE KEY UPDATE
.
To see how this is done, we can write your SELECT then UPDATE
as a single UPDATE
. The calculations are done in the SQL.
update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)
where profile_id=%s and
keyword=%s and
landing_page=%s
Your INSERT remains the same...
insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)
Combine them into one INSERT ON DUPLICATE KEY UPDATE.
insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)
on duplicate key update
update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)
This depends on what the keys of the table are defined as. If you have unique( profile_id, landing_page, keyword )
then it should work the same as your code.
Even if you can't do the upsert, you can eliminate the SELECT
by trying the UPDATE
, checking if it updated anything, and if it didn't doing an INSERT
.
Do the updates in bulk. Instead of calling a subroutine which does one update and commits, pass it a big list of things to be updated and work on them in a loop. You can even take advantage of executemany
to run the same statement with multiple values. Then commit.
You might be able to do the UPSERT
in bulk. INSERT
can take multiple rows at once. For example, this inserts three rows.
insert into whatever
(foo, bar, baz)
values (1, 2, 3),
(4, 5, 6),
(7, 8, 9)
You can likely do the same with your INSERT ON DUPLICATE KEY UPDATE
reducing the amount of overhead to talk to the database. See this post for an example (in PHP, but you should be able to adapt).
This sacrifices returning the ID of the last inserted row, but them's the breaks.