1

I have three tables:

Table 1: | dbo.pc_a21a22 |

    batchNbr                   Other columns...
    --------                   ----------------
      12345  
      12346
      12347

Table 2: | dbo.outcome |

   passageId                 record
   ----------               ---------
      00003                    200
      00003                     9
      00004                     7

Table 3: | dbo.passage |

   passageId                 passageTime        batchNbr
   ----------               -------------       ---------
      00001                   2015.01.01         12345
      00002                   2016.01.01         12345
      00003                   2017.01.01         12345
      00004                   2018.01.01         12346

What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2

What is the most efficient way to do this?

I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.

My current solution (slow):

SELECT TOP 50000
    a.batchNbr,
    CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
    SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults 
    ) 
    ) then 1 else 0 end as bit) as KGT_IO_END

    FROM dbo.pc_a21a22 a

The desired output is:

batchNbr     200present
---------    ----------
12345            1
12346            0
AlexGuevara
  • 932
  • 11
  • 28

2 Answers2

3

I suggest you use table joining rather than subqueries.

select 
    a.*, b.*
from 
    dbo.table1 a 
join 
    dbo.table2 b on a.id = b.id 
where 
    /*your where clause for filtering*/

EDIT:

You could use this as a reference Join vs. sub-query

Community
  • 1
  • 1
onhax
  • 199
  • 12
  • 1
    It has to be `select a.*` - not `select *.a` ..... but otherwise: perfect response! – marc_s Apr 07 '17 at 06:37
  • @marc_s - Thank you sir! Missed that one. – onhax Apr 07 '17 at 06:40
  • 2
    @AlexGuevara: using a `JOIN` is typically a lot more efficient than a subquery. That's a known fact and one of the first points to look at in such a situation. I guess the query optimizer can more easily "understand" and optimize a JOIN, rather than a subquery... – marc_s Apr 07 '17 at 06:42
  • 1
    @AlexGuevara - take a look at [this](https://dev.mysql.com/doc/refman/5.7/en/rewriting-subqueries.html) – onhax Apr 07 '17 at 06:43
  • what about the latest passageTime condition, how would you rewrite that? – AlexGuevara Apr 07 '17 at 06:54
  • @AlexGuevara can you add your desired output on your question? – onhax Apr 07 '17 at 07:01
  • @TheNoob - just did. BatchNbr 12345 delivers one because the latest passageTime is associated with passageId 3, for which the record 200 is present. In the case of BatchNbr 12346, the latest passageTime has passageId 4, which has no record 200. – AlexGuevara Apr 07 '17 at 07:07
  • @AlexGuevara - seems like you have to use subuquery for your second column "200present". Do you mean the row count for column "Record"? – onhax Apr 07 '17 at 07:31
  • @TheNoob - basically it's just a Boolean which indicates not the count of 200 in the column record, but it's presence for that batchNbr. In my data, batchNbr is a unique Id, and the code 200 means that examination was succesful all in all. So basically I want to know for each Id (batchNbr), whether I have the 200 value present as a record in the outcome table or not. Could you please provide your full query for the problem (even with unavoidable subquery)? – AlexGuevara Apr 07 '17 at 08:14
1

Try this

SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS 
KGT_IO_END
FROM dbo.Test1 AS a 
LEFT OUTER JOIN 
(SELECT record, p.batchNbr
 FROM dbo.Test2 AS o
 LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM 
 dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;

The MAX subquery is to get the latest passageId by batchNbr. However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.

I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.

Output from your example data:

batchNbr    KGT_IO_END
  12345         0
  12346         0
  12347         0

Output if we change the passageId of record 200 to 00003 (the latest for 12345)

batchNbr    KGT_IO_END
  12345         1
  12346         0
  12347         0
CurseStacker
  • 1,079
  • 8
  • 19
  • thank you for the query! You are right about the passageId issue, I didn't see it when I typed my example - just edited the original post. But you did get my intention :-) – AlexGuevara Apr 07 '17 at 07:16
  • So did it work for the real records? Let me know if you need adjustment. – CurseStacker Apr 07 '17 at 07:22
  • I don't fully get the why the part b.record = 200 works in your first line, since there are two records for that passageId? Wouldn't it return two records? – AlexGuevara Apr 07 '17 at 07:36
  • it did not work for the real records, there are duplicate lines for the records, I think due to the = 200 clause, instead of checking whether 200 is present in any of the two records. – AlexGuevara Apr 07 '17 at 07:50
  • Add a DISTINCT to @CurseStacker 's solution. – onhax Apr 07 '17 at 08:27
  • If you can provide some sample data so we can simulate your case – CurseStacker Apr 07 '17 at 12:06