1

I have tables in a MySQL database like this...

CREATE TABLE `someUserTable` 
(
userId INT NOT NULL,
.....
PRIMARY KEY(userId), 
);

CREATE TABLE `someActivityTable` 
(
activityId INT NOT NULL
userId INT,
.....
PRIMARY KEY(activityId),
KEY(userId) 
);

And if I want to find all the activity for a user (let say '100'), is it more efficient to

SELECT * 
FROM someUserTable U 
LEFT JOIN someActivityTable A ON A.userId = U.userId 
WHERE U.userId = 100;

or

SELECT * 
FROM someUserTable U 
LEFT JOIN 
(SELECT * FROM someActivityTable WHERE userId = 100) A ON A.userId = U.userId 
WHERE U.userId = 100;

-----Edit-----

Using a profiler will tell me what is best for the current situation. But what I am interested in is the theory behind the 2 queries.

So that in future, I can know which query is better when doing development.

Otherwise, I need to want until the application goes into production and the database starts filling up before I can do an accurate profiling. And by that time, it might be too late.

iWantSimpleLife
  • 1,944
  • 14
  • 22
  • 1
    My gut feeling is that `LEFT JOIN` should be faster, but I can't back this up with anything right now so I'll let someone else answer ;) – Polynomial Nov 18 '11 at 09:36
  • EXPLAIN on the first sql statement shows that both tables are accessed using indexes (U using primary, and A using key). – iWantSimpleLife Nov 18 '11 at 09:38
  • EXPLAIN on the Second sql statement shows that both tables are accessed using indexes (U using primary, and A using key, and a third row not using any key). – iWantSimpleLife Nov 18 '11 at 09:38
  • 1
    So what I don't understand is why some programmers insisted on writing it the second way – iWantSimpleLife Nov 18 '11 at 09:39

4 Answers4

2

There is no such thing as a "most efficient" statement because

  1. SQL query optimizers function as black boxes which make decisions based on many data points, decisions that may not be predictable or even apparently consistent.

  2. The most efficient statement could change depending on the amount of data, the presence of indexes, and the cardinality of those indexes so that this mornings "most efficient" choice may not be the same tomorrow for identically structured tables.

In this case, however, we can say that version 1 is superior to version 2 in that it is shorter, easier to understand, and does not contain duplicated filtering logic. Stick with version 1 unless and until you determine that version 2 is a necessary performance hack.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
1

I would say the first one but you can find out with an SQL profiler. If you use MS SQL for example the "SQL Server Profiler" comes with the database.

juergen d
  • 201,996
  • 37
  • 293
  • 362
1

Choose the first one, because it is straight forward. premature optimization is the root of all evil.

As long as there is no real performance problem keep it simple and straight forward. When you've a performance problem then please show us explain plan and time that each statement took.

If you're really aware of performance in productoin then you've to test it before it goes into production. You can do profiling before going into production with filling data on test system.

From developer's point of view the first one is better, because it is not error-prone like the second statement and more straight forward. Keep it simple!

Fabian Barney
  • 14,219
  • 5
  • 40
  • 60
0

Depending on the number of columns that someActivityTable has it can be be the first because in the second one you are selecting all data:

SELECT * FROM someActivityTable WHERE userId = 100

Although it'll be almost the same thing you should use the first one.

aF.
  • 64,980
  • 43
  • 135
  • 198