1

We have a table which stores events (alerts based on machine learning). Each alert provides some information about future possible failures that might happen. Alerts that meet some criteria are special and planned maintenance is scheduled to avoid future failure based on information from such alert.

The table schema is similar to:

|id|fin|datetime|failure|
-------------------------
|1 |fin1|2016-04-02|0|
|2 |fin1|2016-05-02|1|
|3 |fin2|2017-05-02|0|
|4 |fin2|2017-05-12|0|
|5 |fin2|2017-05-01|1|
|6 |fin3|2017-12-01|1|

where fin describes the part which failed or may fail in near future and failure is binary and indicates whether the event is an alert or an actual failure (we need both in same table).

Now what we want is to get all alerts which are labeled as planned maintenance and also the next failure after each such alert (for same FIN of course as the FIN of the alert).

Note we have the ids of those alerts that are labeled as planned maintenance from another table based on another evaluation process. You can consider this as given eg in a list of planned_ids (for example this list may be [1,3] to match the sample data above)

Right now we solve this problem by making multiple requests in DB (mysql but not that important), first to get all events that are labeled as planned maintenance (we have the ids of those events) by using, for example,

select *
from events
where id in [planned_ids]
order by datetime asc -- we dont mind if this is ordered desc as well

Then for each such alert we fetch the next failure for same FIN (functional part) that comes immediately AFTER that event and is a failure. For example by using:

-- using $alert as kind of variable here to denote that the query
-- runs for each alert based on that same alert data i.e datetime and fin part
select *
from events
where fin=$alert.fin and datetime>=$alert.datetime and failure=1
order by datetime desc
limit 0,1

This works but is a kind of N+1 problem.

Can we fetch the alerts and also single next failure for each alert in a single sql query?

Nikos M.
  • 8,033
  • 4
  • 36
  • 43
  • @Strawberry, what else do you think are needed to make the question better? I included a sample table schema and sample queries of what we have right now. Do you think an sqlfiddle is needed,or something else? – Nikos M. May 02 '19 at 17:04

1 Answers1

1

A correlated subquery, perhaps:

select e.*,
       (select e2.id
        from events e2
        where e2.fin = e.fin and e2.datetime > e.datetime
        order by e2.datetime desc
        limit 1
       ) as next_event_id
from events e
where e.id in ( . . . )

If you want additional information, you can use this as a subquery and join back to the events table.

select p.*, nexte.*
from (select e.*,
             (select e2.id
              from events e2
              where e2.fin = e.fin and e2.datetime > e.datetime
              order by e2.datetime desc
              limit 1
             ) as next_event_id
      from events e
      where e.id in ( . . . )
     ) p left join
     events nexte
     on nexte.id = p.next_event_id;

According to MySQL documentation on performance of correlated queries it is stated:

For certain cases, a correlated subquery is optimized. For example:

val IN (SELECT key_val FROM tbl_name WHERE correlated_condition)

Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

Also according to this article:

We have shown that under some circumstances, correlated subqueries can be better than bulk aggregation. In Oracle. With small-medium sized data sets. In other cases, that’s not true as the size of M and N, our two algorithmic complexity variables increase, O(M log N) will be much worse than O(M + N).

Also it seems a kind of N+1 problem still exists (although less severe performance-wise than having N separate queries) even if using correlated subqueries to construct a single query to fetch our data.

Nikos M.
  • 8,033
  • 4
  • 36
  • 43
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • yes I see, the planned events we have in a list of ids which we know are planned events, not a table planned. Can you modify your sql to account for this, effectively select id in [1,3] for example (planned events) – Nikos M. May 02 '19 at 17:54
  • also is it possible to have the full data of next failure (effectively we need the datetime only) or we can only ftech the id? – Nikos M. May 02 '19 at 17:55
  • @NikosM. . . . I address the second issue in the answer: use this query as a subquery and join back to the `events` table. – Gordon Linoff May 02 '19 at 18:00
  • can you add an example of joining back to events table to get all fields of next failure, since in reality we need more than one field of the next failure (sample schema in question is simpler than what we need)?. Thank you! – Nikos M. May 02 '19 at 18:04
  • also what happens if there are no next failures? this is an issue our current appoach takes into account correctly, how about this approach, how are we supposed to handle it? Effectively we would be happy with a null value or values – Nikos M. May 02 '19 at 18:07
  • how about the performance of correlated queries? is it better than N+1 as we have or is it effectively a N+1 query since for each selected row, a sub-select is made? Unless my understanding is wrong I think this is the case, of course it is a single query – Nikos M. May 02 '19 at 19:23
  • i took the liberty to update your answer with some points related to correlated subquery performance and the N+1 problem. Thank you +1 – Nikos M. May 03 '19 at 09:41