When is it better not to use inner join?

Question

I have two tables:

table1 (id, name, connte)
table2 (id, name, connte)

They are connected through table1.connte and table2.connte. And each of them contains 100 records.

Now if I want to delete a record from table1 with id = 20 and its corresponding children in table2, is it better to do the following:

   DELETE d1,d2 FROM table1 d1 INNER JOIN table2 d2 ON d1.connte= d2.connte WHERE d1.id = 20

or the following:

  select connte from table1 where id = 20
  --Store connte in a variable say aabc--
  delete from table2 where connte = aabc   -> execute this first
  delete from table1 where id = 20    -> execute this second

If there is only one parent and one child for a record that I want to delete (here table1.id =20), then isn't it expensive to do inner join for the whole table?

I am running this query from JAVA (so JDBC), so is it more expensive (performance wise) to run multiple queries or an inner join, for the above mentioned condition?

NOTE: Assuming no referential integrity for the tables. So, I am not using cascade delete.

score 6 · Accepted Answer · answered Jun 05 '12 at 14:04

6

It will probably be faster to do it in one query, rather than two. You are basically trying to do optimizations yourself instead of letting the DBMS do it, and generally DBMS' are really good at this sort of stuff.

Also, you probably won't have to worry about delete performance for tables this small. 100 x 100 rows is still very small, so your DBMS should be able to handle this without a problem.

answered Jun 05 '12 at 14:04

Oleksi

12,947
4
56
80

Hmm..ok, but what about when I have, say a hundred thousand or a million records? Which approach would then be better for my mentioned condition? – Harke Jun 05 '12 at 14:29
I would still recommend letting the DBMS handle it. It knows how to do these optimizations better than you probably can. – Oleksi Jun 05 '12 at 14:32
Is there any way I can know how inner join works? Won't joining two tables with million records each, take time? (even after filtering with the ON clause, there is a possibility that there are a lot of records to join) – Harke Jun 05 '12 at 14:37
@Harke, if you are deleting millions of records, you want to do that in batches anyway. And joins are fast if you have properly indexed. – HLGEM Jun 05 '12 at 14:42
The DBMS generally chooses to do the join in a way that doesn't lead to huge amounts of intermediate result records. As mentioned in other answers, you can see exactly what kind of join was executed by using `showplan` – Oleksi Jun 05 '12 at 14:46
@HLGEM, I am just deleting one record each from table1 and table2. But each in itself have a million records. So, with this condition, I wanted to know if inner join was a good idea or not. – Harke Jun 05 '12 at 14:55
It does take time to go through millions of records but a solid schema design and proper querying can really limit the hit on it. You do want to try and do it in one query though to limit the connections though. Assuming id is your primary key and also acts as an index then an inner join shouldn't be a burden. There may be a better way but there isn't a solution thats going to be instant – Tony318 Jun 05 '12 at 18:36

Paul Sanwald · Answer 2 · 2012-06-05T17:26:46.557

The best way to do this is to run an explain plan on both queries in your DBMS, and the output gives you things like I/O estimates, etc. In general, anytime you have a question about what will be more efficient from a DBMS perspective, looking at query plans and cost estimates is your best weapon.

SET showplan on

should work in SQL server. Here's the MSDN documentation, you may need a slightly different syntax depending on your DBMS. I/O Cost estimate is probably what you care about most.

for MySQL, it looks like you can use

EXPLAIN SELECT * FROM some_table

Here is a tutorial on analyzing queries using Explain.

In general, Oleksi is correct; most optimizers are very good at this kind of thing and manually creating a temp table isn't going to buy you very much. Many times the optimizer's query plan will involve creating a temp table.

For a delete query, one important thing to ensure for speed is that your delete clause is using indexed parameters, thus not causing a full table scan. Explain/Showplan will tell you what type of scan your query is doing, and what indices it is using. You generally want to avoid full table scans.

score 1 · Answer 3 · answered Jun 05 '12 at 14:42

1

In Sql Server you cannot delete from two tables in one delete statement (without a trigger or cascade delete).

answered Jun 05 '12 at 14:42

HLGEM

94,695
15
113
186

So, in my case (which has no referential integrity and no triggers), only the second condition is possible? – Harke Jun 05 '12 at 15:01

score 0 · Answer 4 · edited May 23 '17 at 10:30

0

Why not consider using foreign keys and cascading deletes ? If you set your table properly you would simply have to delete the parent and then the child automatically gets taken care of. See this link When/Why to use Cascading in SQL Server?

edited May 23 '17 at 10:30

Community

1
1

answered Jun 05 '12 at 15:26

Stainedart

1,929
4
32
53

When is it better not to use inner join?

4 Answers4