23

How Important is it to avoid nested queries.

I have always learnt to avoid them like a plague. But they are the most natural thing to me. When I am designing a query, the first thing I write is a nested query. Then I convert it to joins, which sometimes takes a lot of time to get right. And rarely gives a big performance improvement (sometimes it does)

So are they really so bad. Is there a way to use nested queries without temp tables and filesort

hobodave
  • 28,925
  • 4
  • 72
  • 77
Midhat
  • 17,454
  • 22
  • 87
  • 114
  • 1
    I doubt it matter at all performance wise, at least in MS SQL Server. MySQL isn't quite as smart... – Thorarin May 06 '10 at 06:09

4 Answers4

10

It really depends, I had situations where I improved some queries by using subqueries.

The factors that I am aware are:

  • if the subquery uses fields from outer query for comparison or not (correlated or not)
  • if the relation between the outer query and sub query is covered by indexes
  • if there are no usable indexes on the joins and the subquery is not correlated and returns a small result it might be faster to use it
  • i have also run into situations where transforming a query that uses order by into a query that does not use it and than turning it into a simple subquery and sort that improves performance in mysql

Anyway, it is always good to test different variants (with SQL_NO_CACHE please), and turning correlated queries into joins is a good practice.

I would even go so far to call it a very useful practice.

It might be possible that if correlated queries are the first that come to your mind that you are not primarily thinking in terms of set operations, but primarily in terms of procedural operations and when dealing with relational databases it is very useful to fully adopt the set perspective on the data model and transformations on it.

EDIT: Procedural vs Relational
Thinking in terms of set operations vs procedural boils down to equivalence in some set algebra expressions, for example selection on a union is equivalent to union of selections. There is no difference between the two.
But when you compare the two procedures, such as apply the selection criteria to every element of an union with make a union and then apply selection, the two are distinctly different procedures, which might have very different properties (for example utilization of CPU, I/O, memory).

The idea behind relational databases is that you do not try to describe how to get the result (procedure), but only what you want, and that the database management system will decide on the best path (procedure) to fulfil your request. This is why SQL is called 4th generation language (4GL).

One of the tricks that help you do that is to remind yourself that tuples have no inherent order (set elements are unordered). Another is realizing that relational algebra is quite comprehensive and allows translation of requests (requirements) directly to SQL (if semantics of your model represent well the problem space, or in another words if meaning attached to the name of your tables and relationships is done right, or in another words if your database is designed well).

Therefore, you do not have to think how, only what.

In your case, it was just preference over correlated queries, so it might be that I am not telling you anything new, but you emphasized that point, hence the comment.

I think that if you were completely comfortable with all the rules that transform queries from one form into another (rules such as distributiveness) that you would not prefer correlated subqueries (that you would see all forms as equal).

(Note: above discusses theoretical background, important for database design; practically the above concepts deviate - not all equivalent rewrites of a query are necessarily executed as fast, clustered primary keys do make tables have inherit order on disk, etc... but these deviations are only deviations; the fact that not all equivalent queries execute as fast is an imperfection of the actual DBMS and not the concepts behind it)

Unreason
  • 12,556
  • 2
  • 34
  • 50
  • Do you have an example of "thinking in terms of set operations" vs "in terms of procedural operations". – Midhat May 12 '10 at 07:17
  • @Midhat: Commented in the answer (EDIT). Sorry if not what you wanted, were you looking for more practical examples or an explanation such as above? – Unreason May 12 '10 at 08:37
4

Personally I prefer to avoid nested queries until they are necessary for the simple reason that nested queries can make the code less human readable and make debugging and collaboration more painful. I think nesting is acceptable if the nested query is something trivial or if temporary storage of large tables becomes an issue. But too many times I've seen complex nested queries within nested queries and it makes debugging painful.

Anto
  • 449
  • 5
  • 14
1

I'm not sure how it looks like in MySQL 5.1 or 5.5, but in 5.0.x nested queries have usually horrible performance, because MySQL performs subquery for each row fetched from main query. This probably isn't the case for more mature databases like MsSQL, which internally can rewrite nested queries to joins, but I've never used MsSQL so I don't know for sure.

http://dev.mysql.com/doc/refman/5.0/en/rewriting-subqueries.html

It is also true that on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. - which is rather funny statement, taking into account that for me so far all subqueries make database crawl.

Subqueries vs joins

Community
  • 1
  • 1
Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
0

I try to avoid nested queries because they're less readable.

But I do agree that they are easier to write. I mean, it's just easier to conceptualize when writing the code, IMO. But then, the deep nesting just makes reading the code very difficult. Make you you add some comment to tell the reader what the subquery is doing, so they don't need to read your subquery if they don't need to.

Also, as soon as it starts getting difficult to read, you might want to consider converting the subquery into a Common Table Expression. The conversion is easy to do, and also makes it much easier to read, since each CTE has a specific purpose.

NL3294
  • 984
  • 1
  • 10
  • 27