1

Consider there is a table of job runs history with the following schema:

job_runs
(
    run_id integer not null, -- identifier of the run
    job_id integer not null, -- identifier of the job
    run_number integer not null, -- job run number, run numbers increment for each job
    status text not null, -- status of the run (running, completed, killed, ...)
    primary key (run_id)
    -- ...
)

and it is required to get the last 10 runs with status != 'running' for each job (jobs differ by job_id). To do that I wrote the following query:

SELECT
    *
FROM
    job_runs AS JR1
WHERE
    JR1.run_number IN
    (
        SELECT
            JR2.run_number
        FROM
            job_runs AS JR2
        WHERE
            JR2.job_id = JR1.job_id
            AND
            JR2.status != 'running'
        ORDER BY
            JR2.run_number
        DESC
        LIMIT
            10
    )

It do what I need, but even though there is a multifield index on the job_id and run_num fields of the job_runs table the query is slow, because it scans job_runs table and for each its row runs subquery. The index helps subqueries to run fast each time, but the fact that the nester query scans entire table kills performance. So how can I tune performance of the query?

some thoughts:

Number of jobs (different job_ids) is small and if there were a FOR loop in SQLite it would be easy to loop over all distinct job_ids and run the subquery passing the job id instead of JR1.job_id then UNION all results.

important:

Please don't suggest to run the loop inside the source code of my application. I need pure SQL solution.

Mihran Hovsepyan
  • 10,810
  • 14
  • 61
  • 111
  • I think you're out of luck. [This has been asked before](http://stackoverflow.com/q/4074257/14660) and they came up with your solution. SQLite is very limited, trying to do everything in it isn't always the best choice. Why do you need a pure SQL solution? – Schwern Nov 02 '15 at 18:58
  • You might want to try on [dba.stackexchange.com](http://dba.stackexchange.com). – Schwern Nov 02 '15 at 19:07
  • @Schwern Thank you for dba.stackexchange.com, BTW is there a way to move the question? What about "pure SQL solution", it is an issue of application design. – Mihran Hovsepyan Nov 03 '15 at 08:28
  • If your application is designed so that all data queries must be 100% performed in SQLite you have a serious design problem. SQLite is not flexible enough. You will hit this problem again. I would recommend either switching databases or changing your application design. – Schwern Nov 03 '15 at 19:22
  • No @Schwern it is not the case. – Mihran Hovsepyan Nov 04 '15 at 13:10

1 Answers1

1

You could increase the performance of the subquery further by creating a covering index for it:

CREATE INDEX xxx ON job_runs(job_id, run_number, status);

But the biggest performance problem is that the subquery is executed for each row, although you need to run it only for each unique job ID.

So, first, get just the unique job IDs:

SELECT DISTINCT job_id
FROM job_runs

Then, for each of these IDs, determine the tenth largest run number:

SELECT job_id,
       (SELECT run_number
        FROM job_runs
        WHERE job_id = job_ids.job_id
          AND status != 'running'
        ORDER BY run_number DESC
        LIMIT 1 OFFSET 9
       ) AS first_run_number
FROM (SELECT DISTINCT job_id
      FROM job_runs) AS job_ids

But if there are less than ten run numbers for a job, the subquery returns NULL, so let's replace that with a small number so that the comparion below (run_number >= first_run_number) works:

SELECT job_id,
       IFNULL((SELECT run_number
               FROM job_runs
               WHERE job_id = job_ids.job_id
                 AND status != 'running'
               ORDER BY run_number DESC
               LIMIT 1 OFFSET 9
              ), -1) AS first_run_number
FROM (SELECT DISTINCT job_id
      FROM job_runs) AS job_ids

So now we have the first interesting run for each job. Finally, join these values back to the original table:

SELECT job_runs.*
FROM job_runs
JOIN (SELECT job_id,
             IFNULL((SELECT run_number
                     FROM job_runs
                     WHERE job_id = job_ids.job_id
                       AND status != 'running'
                     ORDER BY run_number DESC
                     LIMIT 1 OFFSET 9
                    ), -1) AS first_run_number
      FROM (SELECT DISTINCT job_id
            FROM job_runs) AS job_ids
     ) AS firsts
  ON job_runs.job_id = firsts.job_id
 AND job_runs.run_number >= firsts.first_run_number;
CL.
  • 173,858
  • 17
  • 217
  • 259