A join here seems unnecessary

Question

TableA
------
id
Name
other_fields

TableB
------
A_id (foreign key to TableA.id)
other_fields

Select entries from TableB which reference entries in TableA with some specific property (e.g. Name = "Alice")

This can be easily done with a join:

SELECT TableB.*
FROM TableA INNER JOIN TableB on TableA.id = TableB.A_id
WHERE TableA.Name = "Alice"

Being used to procedural programming, the join seems overkill and unnecessary as we don't actually need any information from TableA other than the id of Alice.

So -- assuming that Alice is unique -- is there a way to do this (pseudocode):

variable alice_id = get id of Alice from TableA

SELECT *
FROM TableB
WHERE A_id = alice_id

If yes, should it be used in favor of the classical JOIN method? Is it faster? (in principle, of course)

Related: http://stackoverflow.com/questions/5274618/inner-join-and-where-in-clause-performance — Giorgos Betsos, Nov 05 '15 at 21:28

score 3 · Accepted Answer · answered Nov 05 '15 at 21:28

3

You're asking if you can do this:

SELECT * FROM TableB WHERE A_id = (SELECT id FROM TableA WHERE Name = 'Alice');

It's a perfectly legitimate query, but MySQL will perform much better doing the join because the subquery is treated as a second separate query. Using the MySQL EXPLAIN command (just put it in front of your SELECT query) will show the indexes, temporary tables, and other resources that are used for a query. It should give you an idea when one query is faster or more efficient than another.

answered Nov 05 '15 at 21:28

miken32

42,008
16
111
154

being used to C++, to me JOINS seem the worst evil. I mentally associate them with creating a new matrix out a 2 matrices. Need to wrap my head around sql's way of doing things. – bolov Nov 05 '15 at 21:50
SQL is a different paradigm. Relational databases are optimized to handle joins. Using procedural code to avoid a join is pretty much an anti-pattern. – APC Nov 08 '15 at 00:21
Phrased another way... C++ can work only with scalars; SQL is optimized to work with vectors. Do things en masse, do not iterate. – Rick James Nov 28 '15 at 14:56

score 1 · Answer 2 · answered Nov 05 '15 at 21:37

For your workload and indexes, you should try both queries' execution plan and runtime. In either case you would benefit from having an index on name.

I believe that both queries are going to end up with similar plans. Let's check that out.

Create the tables

create table tablea (id int primary key, nm as varchar(50));
create index idx_tablea_nm on tablea(nm);
create table tableb(a_id int, anotherfield varchar(100), 
      key idx_tableb_id(a_id), 
      constraint fk_tableb_tablea_id foreign key (a_id) references tablea (id));

Let's do an EXPLAIN on the first one:

explain select tableb.* from tablea inner join tableb on tablea.id = tableb.a_id where tablea.nm = 'Alice';
+----+-------------+--------+------+-----------------------+---------------+---------+-------------------+------+--------------------------+
| id | select_type | table  | type | possible_keys         | key           | key_len | ref               | rows | Extra                    |
+----+-------------+--------+------+-----------------------+---------------+---------+-------------------+------+--------------------------+
|  1 | SIMPLE      | tablea | ref  | PRIMARY,idx_tablea_nm | idx_tablea_nm | 53      | const             |    1 | Using where; Using index |
|  1 | SIMPLE      | tableb | ref  | idx_tableb_id         | idx_tableb_id | 5       | tablea.id         |    1 | Using where              |
+----+-------------+--------+------+-----------------------+---------------+---------+-------------------+------+--------------------------+

Let's do EXPLAIN on the second one:

explain select * from tableb where a_id = (select id from tablea where nm = 'Alice');
+----+-------------+--------+------+---------------+---------------+---------+-------+------+--------------------------+
| id | select_type | table  | type | possible_keys | key           | key_len | ref   | rows | Extra                    |
+----+-------------+--------+------+---------------+---------------+---------+-------+------+--------------------------+
|  1 | PRIMARY     | tableb | ref  | idx_tableb_id | idx_tableb_id | 5       | const |    1 | Using where              |
|  2 | SUBQUERY    | tablea | ref  | idx_tablea_nm | idx_tablea_nm | 53      |       |    1 | Using where; Using index |
+----+-------------+--------+------+---------------+---------------+---------+-------+------+--------------------------+

I don't have much data in those tables and with little data you will notice identical performance. As the workload changes, the execution play may change.

A join here seems unnecessary

2 Answers2