8

I've inherited some tables which I'm trying to clean up but first I'm trying to join everything I need but am having problems because there's more than one way to get to the table SpecialEvents thru EventRegistrations.

In some cases EventRegistrations can be joined directly using event_registrations.scoreable_id while in other cases you must first join another table SpecialPlaces, you can know which way you need to go via event_registrations.scoreable_type which is either SpecialEvent or SpecialPlace.

Basically, how can I join SpecialEvents in the case where I must also join SpecialPlaces first? Eg If I try to join SpecialEvents in two different ways I get an error: "table name "special_events" specified more than once".

SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_events ON event_registrations.scoreable_id = special_events.id AND event_registrations.scoreable_type = 'SpecialEvent'
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at

SpecialEvent

+----+-----------+---------------------------+-----------+---------------------------+
| id | region_id | start_at                  | state     | created_at                |
+----+-----------+---------------------------+-----------+---------------------------+
| 2  | 1         | 2015-10-22 19:30:00 +0100 | published | 2015-09-21 09:41:05 +0100 |
| 4  | 1         | 2016-01-21 19:30:00 +0000 | published | 2015-11-26 15:11:25 +0000 |
| 3  | 1         | 2016-01-28 19:30:00 +0000 | published | 2015-11-23 16:16:27 +0000 |
| 5  | 1         | 2016-12-31 19:30:00 +0000 | draft     | 2016-02-24 15:17:22 +0000 |
| 6  | 1         | 2016-05-16 19:30:00 +0100 | published | 2016-03-29 14:33:40 +0100 |
| 10 | 1         | 2016-09-12 19:30:00 +0100 | published | 2016-06-28 17:18:54 +0100 |
| 8  | 1         | 2016-10-07 19:30:00 +0100 | draft     | 2016-06-09 15:03:36 +0100 |
| 7  | 1         | 2016-05-23 19:30:00 +0100 | published | 2016-03-30 19:30:21 +0100 |
| 9  | 1         | 2016-08-04 19:30:00 +0100 | published | 2016-06-09 15:18:56 +0100 |
| 11 | 1         | 2016-11-07 19:30:00 +0000 | draft     | 2016-07-11 17:20:11 +0100 |
+----+-----------+---------------------------+-----------+---------------------------+

SpecialPlaces

+----+------------------+----------+---------------------------+
| id | special_event_id | place_id | created_at                |
+----+------------------+----------+---------------------------+
| 1  | 2                | 243      | 2015-10-12 18:07:09 +0100 |
| 3  | 2                | 83       | 2015-10-15 15:54:40 +0100 |
| 5  | 4                | 262      | 2015-11-26 16:29:35 +0000 |
| 4  | 3                | 262      | 2015-11-23 16:25:31 +0000 |
| 6  | 5                | 281      | 2016-02-24 15:20:33 +0000 |
| 7  | 6                | 262      | 2016-03-29 14:34:00 +0100 |
| 8  | 7                | 262      | 2016-04-11 13:28:00 +0100 |
| 9  | 8                | 262      | 2016-06-09 15:03:52 +0100 |
| 12 | 11               | 262      | 2016-07-11 17:20:26 +0100 |
| 10 | 9                | 262      | 2016-06-09 15:20:08 +0100 |
+----+------------------+----------+---------------------------+

Event Registration

+----+---------+--------------+----------------+-------+---------------------------+
| id | team_id | scoreable_id | scoreable_type | state | created_at                |
+----+---------+--------------+----------------+-------+---------------------------+
| 1  | 3979    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 2  | 3717    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 3  | 3626    | 8            | SpecialPlace   | draft | 2015-11-30 10:09:06 +0000 |
| 4  | 3202    | 8            | SpecialPlace   | draft | 2015-11-30 10:09:06 +0000 |
| 5  | 703     | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 6  | 278     | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 7  | 3166    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 8  | 3147    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 9  | 3146    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
| 10 | 3145    | 2            | SpecialEvent   | draft | 2015-11-30 10:09:06 +0000 |
+----+---------+--------------+----------------+-------+---------------------------+

enter image description here

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ere
  • 1,739
  • 3
  • 19
  • 41
  • 1
    Without seeing any data, it might be difficult to answer your question. – Tim Biegeleisen Jun 13 '19 at 05:40
  • You are not clearly explaining what you want or your problems. Say what result you want in terms of base tables or say what base & result table rows say re the business situation. Explain "try to join" & "get an error" in detail. Please in code questions give a [mcve]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & clear specification & explanation. That includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) – philipxy Jun 13 '19 at 06:48
  • This seems likely to be a common error where people want the join on a shared key of some aggregations (each possibly involving joining) but they erroneously try to do all the joining then all the aggregating or aggregate over previous aggregations. [Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs](https://stackoverflow.com/a/45252750/3404097) PS There are no "paths" in querying. A relational model table represents a relation(ship)/association. A join holds the rows of values that participate in both inputs' relationships. Constraints need not be known to query. – philipxy Jun 13 '19 at 06:54
  • Updated with more data. – ere Jun 13 '19 at 07:37
  • 1
    "You are not clearly explaining what you want or your problems." "cut & paste & runnable code" "code that you show is OK extended by code that you show is not OK" Etc. Etc. PS Please [use text, not images/links, for text--including tables & ERDs.](https://meta.stackoverflow.com/q/285551/3404097) Paraphrase or quote from other text. Use images only for what cannot be expressed as text or to augment text. Images cannot be searched for or cut & pasted. Include a legend/key & explanation with an image. PS Google 'stackexchange notifications' to learn to use @x to notify a user x about a comment. – philipxy Jun 13 '19 at 08:56
  • 1
    If you want to get good answers, ask a clear question that makes it easy for people to run code--act on my earlier comments. PS Find out what a table alias (correlation name) is & how to join with the same table twice by using a different alias each time. PS Observe how the answer by holden does exactly what I said--it does some aggregations separately & joins them in on a shared PK/UNIQUE. – philipxy Jun 21 '19 at 07:05
  • 1
    Actual table definitions, your version of Postgres and a clarified objective would be instrumental. – Erwin Brandstetter Jun 21 '19 at 16:01

3 Answers3

7

What my colleagues mean to say is the way you want to do it is not feasible, however, there are a myriad of ways to do the same thing.

What would you could do to avoid two joins is create a combined table of both SpecialEvents and SpecialPlaces containing all the information you want there and then JOIN that.

eg something like this:

SELECT event_registrations.id, array_agg(teams.name), event_registrations.number_of_players, event_registrations.state, event_registrations.created_at, array_agg(players.email), array_agg(special_events.name), array_agg(special_places.id)
FROM event_registrations
LEFT JOIN teams ON event_registrations.team_id = teams.id
LEFT JOIN team_memberships ON teams.id = team_memberships.team_id
LEFT JOIN players ON team_memberships.player_id = players.id
LEFT JOIN special_places ON event_registrations.scoreable_id = special_places.id AND event_registrations.scoreable_type = 'SpecialPlace'
LEFT JOIN (
SELECT special_events.id AS special_event_id, special_places.id AS special_place_id, special_events.name
FROM special_places
LEFT JOIN special_events ON special_places.special_event_id = special_events.id
UNION
SELECT special_events.id AS special_event_id, null AS special_place_id, special_events.name
FROM special_events
) el1
ON (event_registrations.scoreable_id = el1.special_place_id AND event_registrations.scoreable_type = 'SpecialPlace') OR (event_registrations.scoreable_id = el1.special_event_id AND event_registrations.scoreable_type = 'SpecialEvent')
GROUP BY event_registrations.id, event_registrations.number_of_players, event_registrations.state, event_registrations.created_at
holden
  • 13,471
  • 22
  • 98
  • 160
  • Many thanks @holden: I formerly used a `where exists` with `or` statements to check the various paths in a similar situation, but was getting to performance issues. This approach works much much faster (MariaDB). – bfredo123 Mar 25 '23 at 15:24
4

Assuming that id is the PRIMARY KEY column in each of the given tables, and based on some educated guesses:

SELECT er.id
     , t.name  AS team_name            -- can only be 1, no array_agg
     , er.number_of_players
     , er.state
     , er.created_at
     , tp.player_emails                -- pre-aggregated!
     , se.name AS special_event_name   -- can only be 1, no array_agg
     , sp.id   AS special_pace_id      -- can only be 1, no array_agg
FROM   event_registrations   er
LEFT   JOIN teams t ON t.id = er.team_id
LEFT   JOIN (
   SELECT tm.team_id, array_agg(p.email) AS player_emails
   FROM   team_memberships tm
   JOIN   players          p  ON p.id = tm.player_id
   GROUP  BY 1
   ) tp USING (team_id)
LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                              OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'

Much simpler and faster.

Major points

  • If you indeed need to join to the same table twice, you have to use table aliases like:

    FROM   event_registrations er

    which is short for:

    FROM   event_registrations AS er

    Turns out, you do not need to join to the same table twice. Still use table aliases to cut the noise. Related:

  • The only identifiable reason for the global GROUP BY in the outer SELECT was the join to team_memberships that could potentially multiply rows. I moved the aggregation of player_emails to a much cheaper subquery, removed the outer GROUP BY and simplified the rest. Should also be substantially faster. Related:

  • If you need GROUP BY in the outer query - and event_registrations.id is indeed the PRIMARY KEY - then this:

    GROUP  BY er.id, er.number_of_players, er.state, er.created_at
    

    ... is just another noisy way of saying:

    GROUP  BY er.id
    

    Since Postgres 9.1, the PK covers all columns of a table in the GROUP BY clause. See:

    But you don't need that at all.

  • Finally, the core question is solved by joining to special_places conditionally first and then, conditionally again, joining to special_events. Missing columns are filled in with NULL values:

    LEFT   JOIN special_places sp ON sp.id = er.scoreable_id AND er.scoreable_type = 'SpecialPlace'
    LEFT   JOIN special_events se ON se.id = er.scoreable_id AND er.scoreable_type = 'SpecialEvent'
                                  OR se.id = sp.special_event_id AND er.scoreable_type = 'SpecialPlace'
    

    The final AND er.scoreable_type = 'SpecialPlace' is redundant, strictly speaking, as there can be no sp.special_event_id otherwise anyway. I kept it for clarity.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

Mathematically speaking, the order does not affect your result (it affects the efficiency).

Having said that, many RDBMS implementation (Postgres) has features that choose least costly join order.

If you want to enforce a particular join order (even though it is giving you the same answer), you can try brackets. Even then, I'm not sure if query optimizer won't rewrites your query tree to optimize the performance -- changing join order.

Edward Aung
  • 3,014
  • 1
  • 12
  • 15