Given this query, which column or columns should be indexed to optimize query performance?
SELECT *
FROM `activities`
WHERE (user_id = 90000 AND activity_type_id IN(300,400,808,9494))
ORDER BY created_at DESC
LIMIT 70
Given this query, which column or columns should be indexed to optimize query performance?
SELECT *
FROM `activities`
WHERE (user_id = 90000 AND activity_type_id IN(300,400,808,9494))
ORDER BY created_at DESC
LIMIT 70
Certainly all columns in WHERE clauses should be indexed.
But the IN clause is likely to require a table scan.
I think reading these answers might help:
In general, the selection filters can use indexes on user_id
or activity_type_id
or both (in either order).
The ordering operation might be able to use a filter on created_at
.
It is likely that for this query, a composite index on (user_id, activity_type_id)
would give the best result, assuming that MySQL can actually make use of it. Failing that, it is likely to be better to index user_id
than activity_type_id
because it is likely to provide better selectivity. One reason for thinking that is that there would be 4 subsections of the index to scan if it uses an index on activity_type_id
, compared with just one subsection to scan if it uses an index on user_id
alone.
Trying to rely on an index for the sort order is likely to mean a full table scan, so it is less likely to be beneficial. I would not create an index on created_at
to support this query; there might be other queries where it would be beneficial.
You are doing a lookup on user_id and activity_type_id, so create indexes on both columns.
Assuming you're not hiding a JOIN on the actual production code, indexing the "activity_type_id" should be the best one.
I'd add two indexes on the activities table, one on (user_id, activity_type_id) and the other on (created_dt). I'd also look really hard at just which fields from the 'activities' table are actually used; if you can reduce the number of fields retrieved you may improve response time. I'd also grab the query plan before making any changes to the database, then compare it to the plan generated after making any/all changes.
Share and enjoy.
i wouldnt create any additional indexes at all, instead i would have designed my table so it takes full advantage of an innodb clustered primary key !
create table activities
(
user_id int unsigned not null,
activity_id smallint unsigned not null,
primary key (user_id, activity_id) -- composite clustered primary key order is important
)
engine=innodb;
or
create table activities
(
user_id int unsigned not null,
activity_id smallint unsigned not null,
primary key (activity_id, user_id) -- hmmmm the other way round, why is that ?
)
engine=innodb;
Also, have a read of the following:
MySQL and NoSQL: Help me to choose the right one
How to avoid "Using temporary" in many-to-many queries?
60 million entries, select entries from a certain month. How to optimize database?
Rewriting mysql select to reduce time and writing tmp to disk
Hope it helps and remember innodb FTW ;)
To make the right decision you have to take into consideration the following:
If the user_id is part of the primary key (you said it probably is), then is the primary key the clustered index for the table? If it is, is the user_id in the first position in the clustered index? If so, then how many activities do you expect to have per user? If there are 1-40 activities per user, then adding another index wouldn't be useful and would hurt insert performance. The reason is that all of the activity rows for a user will be clustered together and will likely be on the same database page, so adding activity_type_id to the index won't help.
If the primary key is not clustered and the user_id is not in the first position in the primary key, or the user_id is not in the primary key, then you're best bet is a non-clustered index with user_id, activity_type_id. The query optimizer should be smart enough to use the index since both user_id and activity_type_id are in the where statement even with the presence of the IN clause. You could also add created_at at the end of the index since you are ordering the query results that way.
Take care in creating indexes specifically for one query, but if the query is heavily used, it's often necessary.