0

Assuming a main "job" table, and two corresponding "log" tables (one for server events and the other for user events, with quite different data stored in each).

What would be the best way to return a selection of "job" records and the latest corresponding log record (with multiple fields) from each of the two "log" tables (if there are any).

Did get some inspiration from: MySQL Order before Group by

The following SQL would create some example tables/data...

CREATE TABLE job (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` tinytext NOT NULL,
    PRIMARY KEY (id)
);

CREATE TABLE job_log_server (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `job_id` int(11) NOT NULL,
    `event` tinytext NOT NULL,
    `ip` tinytext NOT NULL,
    `created` datetime NOT NULL,
    PRIMARY KEY (id),
    KEY job_id (job_id)
);

CREATE TABLE job_log_user (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `job_id` int(11) NOT NULL,
    `event` tinytext NOT NULL,
    `user_id` int(11) NOT NULL,
    `created` datetime NOT NULL,
    PRIMARY KEY (id),
    KEY job_id (job_id)
);

INSERT INTO job VALUES (1, 'Job A');
INSERT INTO job VALUES (2, 'Job B');
INSERT INTO job VALUES (3, 'Job C');
INSERT INTO job VALUES (4, 'Job D');

INSERT INTO job_log_server VALUES (1, 2, 'Job B Event 1', '127.0.0.1', '2000-01-01 00:00:01');
INSERT INTO job_log_server VALUES (2, 2, 'Job B Event 2', '127.0.0.1', '2000-01-01 00:00:02');
INSERT INTO job_log_server VALUES (3, 2, 'Job B Event 3*', '127.0.0.1', '2000-01-01 00:00:03');
INSERT INTO job_log_server VALUES (4, 3, 'Job C Event 1*', '127.0.0.1', '2000-01-01 00:00:04');

INSERT INTO job_log_user VALUES (1, 1, 'Job A Event 1', 5, '2000-01-01 00:00:01');
INSERT INTO job_log_user VALUES (2, 1, 'Job A Event 2*', 5, '2000-01-01 00:00:02');
INSERT INTO job_log_user VALUES (3, 2, 'Job B Event 1*', 5, '2000-01-01 00:00:03');
INSERT INTO job_log_user VALUES (4, 4, 'Job D Event 1', 5, '2000-01-01 00:00:04');
INSERT INTO job_log_user VALUES (5, 4, 'Job D Event 2', 5, '2000-01-01 00:00:05');
INSERT INTO job_log_user VALUES (6, 4, 'Job D Event 3*', 5, '2000-01-01 00:00:06');

One option (only returning 1 field from each table) would be to use nested sub-queries... but the ORDER BY will have to be done in separate queries to the GROUP BY (x2):

SELECT
    *
FROM
    (
        SELECT
            s2.*,
            jlu.event AS user_event
        FROM
            (
                SELECT
                    *
                FROM
                    (
                        SELECT
                            j.id,
                            j.name,
                            jls.event AS server_event
                        FROM
                            job AS j
                        LEFT JOIN
                            job_log_server AS jls ON jls.job_id = j.id
                        ORDER BY
                            jls.created DESC
                    ) AS s1
                GROUP BY
                    s1.id
            ) AS s2
        LEFT JOIN
            job_log_user AS jlu ON jlu.job_id = s2.id
        ORDER BY
            jlu.created DESC
    ) AS s3
GROUP BY
    s3.id;

Which actually seems to perform quite well... just not very easy to understand.

Or you could try to return and sort the log records in two separate sub-queries:

SELECT
    j.id,
    j.name,
    jls2.event AS server_event,
    jlu2.event AS user_event
FROM
    job AS j
LEFT JOIN
    (
        SELECT
            jls.job_id,
            jls.event
        FROM
            job_log_server AS jls
        ORDER BY
            jls.created DESC
    ) AS jls2 ON jls2.job_id = j.id
LEFT JOIN
    (
        SELECT
            jlu.job_id,
            jlu.event
        FROM
            job_log_user AS jlu
        ORDER BY
            jlu.created DESC
    ) AS jlu2 ON jlu2.job_id = j.id
GROUP BY
    j.id;

But this seems to take quite a bit longer to run... possibly because of the amount of records it's adding to a temporary table, which are then mostly ignored (to keep this short-ish, I've not added any conditions to the job table, which would otherwise be only returning active jobs).

Not sure if I've missed anything obvious.

Community
  • 1
  • 1
Craig Francis
  • 1,855
  • 3
  • 22
  • 35

1 Answers1

1

How about the following SQL Fiddle. It produces the same results as both of your queries.

SELECT j.id, j.name, 
( 
  SELECT s.event 
  FROM job_log_server s
  WHERE j.id = s.job_id
  ORDER BY s.id DESC
  LIMIT 1
)AS SERVER_EVENT,
( 
  SELECT u.event 
  FROM job_log_user u
  WHERE j.id = u.job_id
  ORDER BY u.id DESC
  LIMIT 1
)AS USER_EVENT
FROM job j

EDIT SQL Fiddle:

SELECT m.id, m.name, js.event AS SERVER_EVENT, ju.event AS USER_EVENT
FROM 
(
  SELECT j.id, j.name,
  ( 
    SELECT s.id
    FROM job_log_server s
    WHERE j.id = s.job_id
    ORDER BY s.id DESC
    LIMIT 1
  )AS S_E,
  ( 
    SELECT u.id
    FROM job_log_user u
    WHERE j.id = u.job_id
    ORDER BY u.id DESC
    LIMIT 1
  )AS U_E
  FROM job j
) m
LEFT JOIN job_log_server js ON js.id = m.S_E
LEFT JOIN job_log_user ju ON ju.id = m.U_E
Linger
  • 14,942
  • 23
  • 52
  • 79
  • Very good point, and I did forget to mention that I'll need to return multiple fields (unfortunately). – Craig Francis Jul 24 '14 at 17:39
  • @Craig Francis, I updated my answer to give you another option. It will allow you to select multiple fields from any of the three tables. – Linger Jul 24 '14 at 18:10
  • Thanks @Linger, I didn't think about referencing the table twice, first to get the appropriate ID, then to get the required fields... for reference on the live database, without the cache, and returning 526 records (a sub-set of the 6615 records) this takes about 0.025 seconds, the 4 level deep sub-query approach takes about 0.032 seconds, and the two separate sub-query approach was about 0.184 seconds, which seems to be due to the number of records it's excluding. – Craig Francis Jul 25 '14 at 12:50