SELECT .. FROM (SELECT .. FROM ..). How can I improve this query?

Question

I'm learning SQL and have been going through the chapters of the GalaXQL app.

I've written the following query in answer to the question "Hilight the star (or stars) which has the planet with the highest orbit distance in the galaxy. Remember to clear the old hilights before beginning."

This query returns the result starid:23334, which when inserted into the hilight table, allows me to progress. However, the program hangs for a long, long time before handing this result back to me, so I suspect there's a far more efficient way to pull this information from the database.

This query works, but takes a long time to process, how can I improve this query?

INSERT INTO hilight 
SELECT DISTINCT s.starid 
FROM planets AS p, stars AS s, moons AS m 
WHERE s.starid=(SELECT starid 
                FROM planets 
                WHERE orbitdistance=(SELECT MAX(orbitdistance) 
                                     FROM planets));

My logic behind this query structure

First find the planet with the greatest orbit distance from the table "planets".

Second to compare the value of "highest orbit distance" to the "orbitdistance" field of the table "planets" and to return the "starid" that relates to that field.

Third to compare the value of the field "starid" in the table "planets" to the field "starid" in the table "stars" and then insert that "starid" to the table "hilight".

Data structure:

enter image description here

Try changing `SELECT DISTINCT s.starid FROM planets AS p, stars AS s, moons AS m WHERE...` to: `SELECT s.starid FROM stars AS s WHERE...` — ypercubeᵀᴹ, Oct 08 '14 at 14:11
Problem number 1 is that you are using [ancient join syntax.](https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-old-style-joins) aslo why are you doing a cross join? Shouldn't you have an `ON` condition for those joins? If you did you probably wouldn't need that distinct either. — Zane, Oct 08 '14 at 14:13
@ypercube: That would result in lots of copies of "starid" being passed to "hilight". As "starid" in the "hilight" table is a unique integer, this causes the error "column starid is not unique" and doesn't allow me to progress. However, DISTINCT does seem to be the issue, running the query without DISTINCT and without it attempting to insert the data into the hilight table runs successfully and quickly. — AidenWebb, Oct 08 '14 at 14:18
@Awebb the problem does say "(or stars)" so we expect there to be more than one result. — Hogan, Oct 08 '14 at 14:19
@Awebb How can `SELECT s.starid FROM stars AS s WHERE...` ever result in duplicates? Not possible. — ypercubeᵀᴹ, Oct 08 '14 at 14:20
@Awebb you get many values because you are doing a `cross join`. Which brings me back to my original point. You need to fix your `JOIN`'s! — Zane, Oct 08 '14 at 14:20
@Zane : I'm not sure what Cross Joins and ON conditions are yet, I'm only just starting to learn SQL and was told GalaSQL gave a good basic understanding and interactive tutorial. Can you recommend any other apps/tutorials/guides that aren't so outdated? — AidenWebb, Oct 08 '14 at 14:21

Zane · Accepted Answer · 2014-10-08T15:06:43.423

Alright let's start by looking at your base query. I know you already have an answer that works but I feel I need to explain to you what was going on in your query.

INSERT INTO hilight 
SELECT DISTINCT 
    s.starid 
FROM planets AS p, stars AS s, moons AS m 
WHERE s.starid=(
    SELECT 
        starid 
    FROM planets 
    WHERE orbitdistance=(
            SELECT 
                MAX(orbitdistance) 
            FROM planets
            )
        );

So here in your FROM clause you are also using several of the old style joins. You also are missing an ON clause. This is referred to as a CROSS JOIN which will produce what we refer to as a Cartesian product. The result set will the number of rows in the first table multiplied by the number of rows in the second table and so on and so on.

So we can fix that by simply fixing the join syntax and joining on the relationships you've shown in your diagram.

INSERT INTO hilight 
SELECT --DISTINCT probably no longer needed as we aren't artifficially creating duplicate rows
    s.starid 
FROM planets AS p
inner join stars AS s
    on s.StarID = p.starid 
inner join moons AS m 
    m.planetID = p.planetID
WHERE s.starid=(
    SELECT 
        starid 
    FROM planets 
    WHERE orbitdistance=(
            SELECT 
                MAX(orbitdistance) 
            FROM planets
            )
        );

Upon further analysis you are joining to the table moons but are not using any of the data nor is it limiting your result set. This means you aren't gaining any benefit from having this in your query and can be cut right out.

INSERT INTO hilight 
SELECT --DISTINCT probably no longer needed as we aren't artifficially creating duplicate rows
    s.starid 
FROM planets AS p
inner join stars AS s
    on s.StarID = p.starid 
WHERE s.starid=(
    SELECT 
        starid 
    FROM planets 
    WHERE orbitdistance=(
            SELECT 
                MAX(orbitdistance) 
            FROM planets
            )
        );

Now upon further analysis if we look at your WHERE clause it seems that it is quite redundant. There seems to me to be no reason to go to the planet table twice to get your predicate when you could simply match the max orbit distance to the planets table. This also eliminates any reason to join to table stars as well.

INSERT INTO hilight 
SELECT 
    p.starid 
FROM planets AS p
WHERE p.orbitdistance=
    (
    SELECT 
        MAX(orbitdistance) 
    FROM planets
    )

The resulting query is much simpler and should run alot faster now that we aren't generating so many duplicate rows. I hope that sheds some light on what was happening in your query.

UPDATE: Upon further review this GalaSQL appears to be quite terrible and has massively outdated information and I would highly recommend not using it.

That's very helpful and makes a lot of sense, thank you. Through GalaXQL it seemed to me that the value had to be passed through each table before being inserted into hilight. It's obviously not the case and I'm somewhat relieved that I won't have to be dealing with that particular programming obstacle. — AidenWebb, Oct 08 '14 at 14:57

score 1 · Answer 2 · edited Oct 08 '14 at 14:35

1

You could remove the first select such that it would liook like...

INSERT INTO hilight
SELECT DISTINCT p.starid 
FROM planets p
WHERE orbitdistance=(
    SELECT MAX(orbitdistance) 
    FROM planets)
 ;

You could also remove the distinct, unless you had a specific reason for including it.

edited Oct 08 '14 at 14:35

ypercubeᵀᴹ

113,259
19
174
235

answered Oct 08 '14 at 14:17

r1chjc

136
5

This works much more efficiently, thanks. Looks like I was vastly over thinking the process of passing between tables. – AidenWebb Oct 08 '14 at 14:27

score 0 · Answer 3 · answered Oct 08 '14 at 14:31

0

you can use CROSS Apply too.

INSERT INTO hilight 
SELECT P.startid
FROM planets P
CROSS APPLY ( SELECT MAX(orbitdistance) as maxVal FROM planets ) T
WHERE P.orbitdistance = T.maxVal

answered Oct 08 '14 at 14:31

radar

13,270
2
25
33

SELECT .. FROM (SELECT .. FROM ..). How can I improve this query?

3 Answers3