0

I am attempting to create a query that pulls information from two other tables, however I only know which table to pull from based on a column in another table. I'm currently looking into doing this using a stored procedure (e.g. build the query and then run it) but I wanted to know if there is a better way to do this, or if I could accomplish it in a single query.

In terms of the connections, ID's are unique accross the entire database, so no two ID's will overlap. However I do not know which subtable the ID relates. I am able to find this by pulling in an unrelated table that happens to have the information (call it the Object Table). One of the columns will give me the table name for the information (in my example below, Person). I have drafted a simple example below. Can you see any way I could accomplish this in a single query? Something like this is what I am aiming for but I am starting to think its not possible.

SELECT * FROM base_table 
LEFT JOIN object ON object.id = base_table.role 
LEFT JOIN [object.type] tmp ON tmp.entity_id = base_table.entity_id

id | role | entity_id        (Base Table)
---------------------
1  | 101  | 1000


id | type                   (Objects Table)
------------
101| person

entity_id | name | etc..    (Person Table)
------------------------
1000      | Bob  | ...

I also expect unions might be a possible solution - but other then just joining all the possible tables and parsing the columns to match up properly (which it could be as many as 20 tables) I'd rather not. This solution is also a bit of a nusience since the columns don't always match in a good way (e.g. the Person table doesn't have similar columns to the Address table)

Sh4d0wsPlyr
  • 948
  • 12
  • 28
  • what would be the use case for this? – JamieD77 Jan 12 '16 at 16:13
  • Are there a fixed number of values for [object.type] column, or will the number of possible values grow in the future? – TT. Jan 12 '16 at 16:19
  • To answer in order - the use case is for displaying a list of information related to a base table. E.g. I have a table for people, and a table for entities, where a person might be related to many entities. It is currently a fixed number but may change in the future (it is not a regular change). – Sh4d0wsPlyr Jan 12 '16 at 17:00
  • To be frank, I think you are going about it the wrong way. You should try and solve your business case in a way that does not require the construct you are asking for here. You can of course do LEFT JOINs for all possible detail-tables, and in your select choose the information from the proper table based on your `[object.type]` column (using a CASE statement). But there is no construct to do this dynamically in SQL Server and don't count on it ever coming into being because that's not the way queries are supposed to be built. – TT. Jan 12 '16 at 17:23
  • @TT. I had that feeling. Since its a legacy system I'd like to just implement a simple connection somewhere that is easy to trace - but sadly that probably can't happen. Similar to what you said what I am currently doing right now is to just use left joins and filter the data to exactly what I want to see. It works - but I can't help but feel it should be improved or the system should be reworked. If no-one else comes up with an answer I'll post my solution or remove this question. – Sh4d0wsPlyr Jan 12 '16 at 17:29

2 Answers2

2

What you probably want to do is the following: for each possible detail-table (ie the possible values in [object.value]), write a query that only links with that one detail-table and have a WHERE clause to restrict to the proper entities. Then do a UNION ALL for all those queries.

Say you have Person, Legal Person and Counterpart as possible values in [object.type]. Suppose the detail-tables have the same names. You can write:

SELECT
  bt.*,
  dt.f1,
  -- ...,
  dt.fn
FROM
  base_table AS bt
  INNER JOIN object AS o ON o.id = bt.role
  INNER JOIN Person AS dt ON dt.entity_id = bt.entity_id
WHERE
   o.type='Person'
UNION ALL
SELECT
  bt.*,
  dt.f1,
  -- ...,
  dt.fn
FROM
  base_table AS bt
  INNER JOIN object AS o ON o.id = bt.role
  INNER JOIN [Legal Person] AS dt ON dt.entity_id = bt.entity_id
WHERE
   o.type='Legal Person'
UNION ALL
SELECT
  bt.*,
  dt.f1,
  -- ...,
  dt.fn
FROM
  base_table AS bt
  INNER JOIN object AS o ON o.id = bt.role
  INNER JOIN Counterpart AS dt ON dt.entity_id = bt.entity_id
WHERE
   o.type='Counterpart'
TT.
  • 15,774
  • 6
  • 47
  • 88
  • 1
    i think you missed the point.. `[object.type]` is actually a value in the `object` table.. – JamieD77 Jan 12 '16 at 16:07
  • I think you might have missed something - [object.type] is a column from the object table. – Sh4d0wsPlyr Jan 12 '16 at 16:07
  • @Sh4d0wsPlyr Oh =) let me revisit that then – TT. Jan 12 '16 at 16:11
  • @Sh4d0wsPlyr Note that this will perform better than your idea about writing it in one query, LEFT JOIN-ing to all possible detail tables and choosing columns from detail tables based on [object.type]. – TT. Jan 12 '16 at 17:50
  • Ah, I was about to ask if this was a better solution over using left joins. – Sh4d0wsPlyr Jan 12 '16 at 17:51
  • @Sh4d0wsPlyr Yes. Because in your initial idea a link would be made to all tables for all rows. In the solution I propose a link is only made to the proper table. Do make sure you have the proper index for looking up the type. – TT. Jan 12 '16 at 17:53
  • 1
    @Sh4d0wsPlyr Also note that the individual queries can be executed in parallel, before being UNION'ed. Do take a look at the execution plan once you execute the query. – TT. Jan 12 '16 at 17:57
  • 1
    This will perform very fast is o.type has an index. – Hogan Jan 12 '16 at 18:44
2

I don't think the left join idea is that bad if you just ignore object type.

Since each ID is unique you don't need to look at type at all if you use coalesce. So to use @TT model as an example:

SELECT bt.*,
  COALESCE(P.f1, L.f1, C.f1) AS f1,
  -- ...,
  COALESCE(P.fn, L.fn, C.fn) AS fn
FROM
  base_table AS bt
  LEFT JOIN Person AS P ON P.entity_id = bt.entity_id
  LEFT JOIN [Legal Person] AS L ON L.entity_id = bt.entity_id
  LEFT JOIN Counterpart AS C ON C.entity_id = bt.entity_id

Depending on your data size and indexes this might perform faster or the same as TT's example -- remember there is only 1 select with N joins while TT's has N selects, 2N joins. It really depends on your data.

If there is some field (fz) that does not show up in all types then you just don't inlcude that in the coalesce clause.

I think this style might be easier to maintain and understand and will be the same or faster as TT code.

Hogan
  • 69,564
  • 10
  • 76
  • 117
  • In your comparison, you do not mention that 1) the queries in the UNION ALL construct will be run in [parallel](https://stackoverflow.com/questions/11571036/unioning-in-parallel-will-sql-server-do-it); 2) the cardinalities are different; each query in the UNION ALL construct JOINs over fewer rows than the grand query that links with each entity table. 3) that the extra join in each query in the UNION ALL doesn't have to be with 2 other tables, this can be done with 1 table (the narrowing can be done on base_table.role rather than object.type) – TT. Jan 12 '16 at 19:58
  • Further, 4) the grand query does a key lookup for each row, to all entity tables whereas in the UNION ALL version, key lookup is done only for the corresponding entity. Suppose M rows, N entity tables, that is MxN lookups for the grand query, whereas only M lookups for the UNION ALL query. Your conclusion that your version is at least as performant or more performant does not take enough criteria or the correct criteria into account. Supposing a suitable INDEX on base_table.role, my estimation is that my version will probably perform better than the grand query. – TT. Jan 12 '16 at 19:58
  • @TT. Remember each join can be parallel. Since I have the same number of joins as you have FULL QUERIES mine should be the same or better. Yours has an extra join per query (to object) and an extra filter (where o.type=x). Mine has a call to COALESCE. It really comes down to how well optimized COALESCE is. From my experience it is faster than an extra query on DB2 but not on SQL Server but it really depends on many factors – Hogan Jan 12 '16 at 20:27
  • @TT. I like your solution too -- and I would try both ways for a given model, existing indexes, and platform if time permits. – Hogan Jan 12 '16 at 20:30
  • True, things should always be tested to make definite claims. The OP should compare execution plans and [`SET STATISTICS TIME ON`](https://msdn.microsoft.com/en-us/library/ms190287.aspx) to compare execution time. – TT. Jan 12 '16 at 20:45
  • I am accepting this answer because this is effectively the direction we opted to go. In terms of speed they are roughly the same in our demo environment. This was chosen for the simplicity, and the fact that its only adding a couple of lines per table (e.g. easier to maintain in the long run). – Sh4d0wsPlyr Jan 13 '16 at 14:18