Which SQL statement is more efficient?

Question

I have tables in a MySQL database like this...

CREATE TABLE `someUserTable` 
(
userId INT NOT NULL,
.....
PRIMARY KEY(userId), 
);

CREATE TABLE `someActivityTable` 
(
activityId INT NOT NULL
userId INT,
.....
PRIMARY KEY(activityId),
KEY(userId) 
);

And if I want to find all the activity for a user (let say '100'), is it more efficient to

SELECT * 
FROM someUserTable U 
LEFT JOIN someActivityTable A ON A.userId = U.userId 
WHERE U.userId = 100;

or

SELECT * 
FROM someUserTable U 
LEFT JOIN 
(SELECT * FROM someActivityTable WHERE userId = 100) A ON A.userId = U.userId 
WHERE U.userId = 100;

-----Edit-----

Using a profiler will tell me what is best for the current situation. But what I am interested in is the theory behind the 2 queries.

So that in future, I can know which query is better when doing development.

Otherwise, I need to want until the application goes into production and the database starts filling up before I can do an accurate profiling. And by that time, it might be too late.

My gut feeling is that `LEFT JOIN` should be faster, but I can't back this up with anything right now so I'll let someone else answer ;) — Polynomial, Nov 18 '11 at 09:36
EXPLAIN on the first sql statement shows that both tables are accessed using indexes (U using primary, and A using key). — iWantSimpleLife, Nov 18 '11 at 09:38
EXPLAIN on the Second sql statement shows that both tables are accessed using indexes (U using primary, and A using key, and a third row not using any key). — iWantSimpleLife, Nov 18 '11 at 09:38
So what I don't understand is why some programmers insisted on writing it the second way — iWantSimpleLife, Nov 18 '11 at 09:39

score 2 · Accepted Answer · answered Nov 18 '11 at 11:12

There is no such thing as a "most efficient" statement because

SQL query optimizers function as black boxes which make decisions based on many data points, decisions that may not be predictable or even apparently consistent.
The most efficient statement could change depending on the amount of data, the presence of indexes, and the cardinality of those indexes so that this mornings "most efficient" choice may not be the same tomorrow for identically structured tables.

In this case, however, we can say that version 1 is superior to version 2 in that it is shorter, easier to understand, and does not contain duplicated filtering logic. Stick with version 1 unless and until you determine that version 2 is a necessary performance hack.

score 1 · Answer 2 · answered Nov 18 '11 at 09:37

1

I would say the first one but you can find out with an SQL profiler. If you use MS SQL for example the "SQL Server Profiler" comes with the database.

answered Nov 18 '11 at 09:37

juergen d

201,996
37
293
362

Well, I am using MySQL. And there is no profiller that comes with it. :( – iWantSimpleLife Nov 18 '11 at 09:40
Oh Good. Normally I just do EXPLAIN and try to make heads or tails from the results... Thanks. – iWantSimpleLife Nov 18 '11 at 09:47
Query Profiler for MySQL - http://www.devart.com/dbforge/mysql/studio/query-profiler.html – Devart Nov 18 '11 at 09:52

Fabian Barney · Answer 3 · 2011-11-18T11:10:35.827

Choose the first one, because it is straight forward. premature optimization is the root of all evil.

As long as there is no real performance problem keep it simple and straight forward. When you've a performance problem then please show us explain plan and time that each statement took.

If you're really aware of performance in productoin then you've to test it before it goes into production. You can do profiling before going into production with filling data on test system.

From developer's point of view the first one is better, because it is not error-prone like the second statement and more straight forward. Keep it simple!

score 0 · Answer 4 · answered Nov 18 '11 at 09:47

0

Depending on the number of columns that someActivityTable has it can be be the first because in the second one you are selecting all data:

SELECT * FROM someActivityTable WHERE userId = 100

Although it'll be almost the same thing you should use the first one.

answered Nov 18 '11 at 09:47

aF.

64,980
43
135
198

Assuming both tables have millions of records. And there are at least 500 activities for each user? – iWantSimpleLife Nov 18 '11 at 09:58
with so many records I'd to test it. But do like @juergend said and user SQL profiler to test it :) – aF. Nov 18 '11 at 10:00

Which SQL statement is more efficient?

4 Answers4