0

I am trying to understand the joining logic behind the following query?? Below is the table that's being used

on t1.log_id-1 = t2.log_id
    where t2.log_id is null

complete query: -

select start_id, min(end_id) as end_id
from (
    select t1.log_id as start_id
    from logs as t1
    left join logs as t2
        on t1.log_id-1 = t2.log_id
    where t2.log_id is null
) tt_start
join (
    select t1.log_id as end_id
    from logs as t1
    left join logs as t2
        on t1.log_id+1 = t2.log_id
    where t2.log_id is null
) tt_end
where start_id<=end_id
group by start_id

Table: -

Log_id
1
2
3
7
8
10
Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • 3
    Please format your SQL to be more readable. – Dai Jan 20 '21 at 17:11
  • 1
    What exactly is your question though? – drum Jan 20 '21 at 17:14
  • Does this answer your question? [How to Join to first row](https://stackoverflow.com/questions/2043259/how-to-join-to-first-row) – Raksha Saini Jan 20 '21 at 18:12
  • 1
    @RakshaSaini almost completely irrelevant. This is a not exists syntax, not a top 1 – Charlieface Jan 21 '21 at 00:01
  • 1
    Please tag SQL requests always with the DBMS you are using. This can be very important, as SQL dialects differ a lot sometimes. – Thorsten Kettner Jan 21 '21 at 00:50
  • @Charlieface Question asks about the logic behind Syntax. and there is more answer available. This question has lots of details regarding Left Join, Right Join and Null. Check this answer:- https://stackoverflow.com/a/27458534/3607051 – Raksha Saini Jan 21 '21 at 02:15
  • Hi Guys, thanks for your comments I am new to this platform and I have not been clear with my question. I wanted to understand what was the logic behind the join condition using ( on t1.log_id-1 = t2.log_id where t2.log_id is null) . Thanks to all who responded :) – nikhil davis Jan 28 '21 at 02:58

3 Answers3

2

It's a kind of not exists logic. It only works when the filtered column cannot be null if there is a matching row.

It's much better to use not exists directly, as the optimizer can understand it better and transform it directly into an anti-join. E.g.:

where not exists (select 1
    from logs as t2
    where t1.log_id-1 = t2.log_id)

The left join construct is often used by people who don't know better, as in most optimizer implementations this construct is not understood well.

For example, in SQL Server, a query plan subtree being guaranteed to only have one row is very useful for certain optimizations. Since left join can in theory double up rows, this guarantee is not there. Even though you and I know this is impossible, there is no logic in the optimizer for this.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
0

The where is not part of the join logic, it is a filter which is only applied after the join logic.

It looks to me like the combination of ON t1.log_id-1 = t2.log_id and WHERE t2.log_id IS NULL ought to give you zero rows. If the value for t2.log_id is null, it cannot also be one less than t1.log_id.

  • The explanation is that this is an *outer* join. t1 rows that have no t2 match get joined with empty t2 columns. `where t2.log_id is null` then removes all matches and only keeps t1 rows that have no match. This method is called **anti join**. As Charlieface explains in their answer this is a rather obfuscated way of merely saying `where not exists ...`. – Thorsten Kettner Jan 21 '21 at 00:56
0

This is a combination of self joins and anti joins.

  1. Self join: A table is joined to itself (here to the row with the ID decremented or incremented by 1).
  2. Anti join: A left outer join and then a WHERE clause to only keep outer-joined rows, thus keeping all rows from the left table that have no match. This is a rather common technique used on young DBMS where joins are already quite optimized and the more straight-forward methods NOT EXISTS and NOT IN are not.

What this query does is:

  1. Find IDs that have no direct predecessor. E.g. for IDs 1, 2, 4, 5, 6, 8, 10, 12, 23, 24 we will find 1, 4, 8, 10, 12, and 23.
  2. Find IDs that have no direct follower. E.g. for IDs 1, 2, 4, 5, 6, 8, 10, 12, 23, 24 we will find 2, 6, 8, 10, 12, and 24.
  3. Join the former with the latter where the former <= the latter: 1-2, 1-6, 1-8, ..., 8-8, 8-10, ..., 23-24.
  4. Get the minimum end ID per start ID: 1-2, 4-6, 8-8, 10-10, 12-12, 23-24.

The query finds thus number ranges. 1, 2, 4, 5, 6, 8, 10, 12, 23, 24 = 1-2, 4-6, 8, 10, 12, 23-24.

This kind of task is called a gaps and island problem. Most often these are solved with window functions:

select min(log(id), max(log_id)
from
(
  select
    log_id,
    log_id - row_number() over (order by log_id) as grp
  from logs
) grouped
group by grp
order by grp;

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3eaeb881c8e5498a02fa0ff34f4cffc3

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73