0

THE DATA

So, lets say I have a application where I'm testing cars for Speed, Performance, Safety, etc. I have a set of data in the following format:

CAR TABLE
ID  CAR_NAME
1   Ford Focus 2006   
2   Toyota Corolla 2014
5   Tesla Model S 2016

TEST TABLE
ID  TEST_NAME
1   acceleration
2   topSpeed
3   zeroToSixty
4   smoothRide
5   seating
6   ergonomics
7   handling
8   breaking
9   audio

TEST_CATEGORY TABLE
ID  CATEGORY_NAME
1   Speed
2   Performance
3   Comfort
4   Safety

TEST_PLAN TABLE
ID  CAR_ID  CATEGORY_ID TEST_ID 
1   1       1           1       
2   1       1           2       
3   1       1           3       
4   1       2           1       
5   1       2           3       
6   1       2           8       
7   1       2           7       
8   1       4           8       
9   1       4           5       
11  5       1           1       
12  5       1           2       
13  5       1           3       
14  5       1           7       
15  5       2           1       
16  5       2           7       
17  5       2           4       
18  5       3           4       
19  5       3           5       

TEST_INSTANCE TABLE
ID  CAR_ID  TEST_ID CATEGORY_ID GRADE  TEST_DATE
1   1       1       1           C       2015-03-14
2   1       1       1           C       2015-03-17
3   1       1       2           D       2015-03-15
4   1       1       2           C       2015-03-16
5   1       1       2           B       2015-03-17
6   1       3       1           B       2015-03-14
7   1       3       1           C       2015-03-16
8   1       8       2           C       2015-03-13   
9   5       1       1           B       2015-03-15
10  5       1       1           B       2015-03-17
11  5       1       2           A       2015-03-19
12  5       1       2           A       2015-03-12
13  5       7       2           B       2015-03-14
14  5       7       2           B       2015-03-16
15  5       7       2           A       2015-03-18
16  5       7       2           B       2015-03-15
17  5       9       5           A       2015-03-14 --note that this test is not part of the test plan

So there's 5 tables, but only the last two (TEST_PLAN, TEST_INSTANCE) are important. The rest are just JOIN dependencies.

TEST_PLAN Table

The TEST_PLAN table defines a series of tests that need to be conducted against each car. Each test can be in more than one category. The TEST_PLAN table shows what tests need to be conducted and in what category. An acceleration test under the speed category, does not count for an acceleration test under performance.

The test plans look a bit like this:
Ford Focus 2006
    Speed
        acceleration
        topSpeed
        zeroToSixty
    Performance
        acceleration
        zeroToSixty
        breaking
        handling
    Safety
        breaking
        seating

Tesla Model S 2016
    Speed
        acceleration
        topSpeed
        zeroToSixty
        handling
    Performance
        acceleration
        handling
        smoothRide
    Comfort
        smoothRide
        seating

TEST_INSTANCE Table

The TEST_INSTANCE table tracks which tests (under which category) have been conducted and when. A given test-category may have been conducted for a given car more than once. Tests that are under no test plan may also have been conducted.

The TEST_INSTANCE table (all JOINed up) looks like this:
ID  CAR                 TEST_NAME       TEST_CATEGORY   GRADE   TEST_DATE
1   Ford Focus 2006     acceleration    Speed           C       2015-03-14  
2   Ford Focus 2006     acceleration    Speed           C       2015-03-17  
3   Ford Focus 2006     acceleration    Performance     D       2015-03-15  
4   Ford Focus 2006     acceleration    Performance     C       2015-03-16  
5   Ford Focus 2006     acceleration    Performance     B       2015-03-17  
6   Ford Focus 2006     zeroToSixty     Speed           B       2015-03-14  
7   Ford Focus 2006     zeroToSixty     Speed           C       2015-03-16  
8   Ford Focus 2006     breaking        Performance     C       2015-03-13
9   Tesla Model S 2016  acceleration    Speed           B       2015-03-17  
10  Tesla Model S 2016  acceleration    Speed           B       2015-03-15  
11  Tesla Model S 2016  acceleration    Performance     A       2015-03-16  
12  Tesla Model S 2016  acceleration    Performance     A       2015-03-14  
13  Tesla Model S 2016  handling        Performance     B       2015-03-19  
14  Tesla Model S 2016  handling        Performance     B       2015-03-12  
15  Tesla Model S 2016  handling        Performance     A       2015-03-13  
16  Tesla Model S 2016  handling        Performance     B       2015-03-18
17  Tesla Model S 2016  zeroToSisty     Speed           A       2015-03-14  --note that this test is not part of the test plan

What I'm Trying to do

I need to do several things involving aggregation. I'm trying to get a list of all tests in a given test plan (for a single car) and JOIN the folloing from test instance:

  1. The date the given test was first run in a given category.

  2. The most recent date a given test was run in a category.

  3. The total number of times a test was run in a given category.

  4. Worst result for a given test-category

  5. Most recent result for a given test-category.

I'm using the following query, and I've figured out how to get 1-4. I'm still having trouble on 5:

SELECT
    tp.id,
    c.car_name
    tc.category_name,
    t.test_name,
    MIN(ti.test_date) as firstRun,       -- this grabs the date of the first time the test-category ran
    MAX(ti.test_date) as latest,         -- this grabs the latest date the test-category ran
    MIN(ti.grade) as worst,              -- this grabs the worst result fron the test-category
    COUNT(ti.test_date) as testRuns,     -- this grabs the total number of tests run from the category.
    ti.grade as latestGrade              -- this seems to give me the most recent grade from a test-category, but I can't be sure.
FROM
    testPlan tp
    LEFT JOIN car c ON c.id = tp.car_id
    LEFT JOIN test_category tc ON tc.id = tp.category_id
    LEFT JOIN test t ON t.id = tp.test_id
    LEFT JOIN test_instance ti ON (
        ti.car_id = tp.car_id
        AND ti.category_id = tp.category_id
        AND ti.test_id = tp.test_id
    )
WHERE
    tp.car_id = 5
GROUP BY
    tp.id;

For item number 5, ti.grade as latestGrade seems to give me the most recent grade for a given test-category, but I have no way of being assured of that. I think I need some sort of ORDER BY to be certain.

EXPECTED RESULTS

The final query should return something like this:
ID  CAR             TEST_CATEGORY       TEST_NAME       FIRST_RUN   LATEST      TESTS_RUN   LATEST_GRADE    WORST_GRADE
11  Tesla Model S   Speed               acceleration    2015-03-15  2015-03-17  2           B               B
12  Tesla Model S   Speed               topSpeed        NULL        NULL        0           NULL            NULL
13  Tesla Model S   Speed               zeroToSixty     NULL        NULL        0           NULL            NULL
14  Tesla Model S   Speed               handling        NULL        NULL        0           NULL            NULL
15  Tesla Model S   Performance         acceleration    2015-03-14  2015-03-16  2           A               A
16  Tesla Model S   Performance         handling        2015-03-12  2015-03-19  4           A               B
17  Tesla Model S   Performance         smoothRide      NULL        NULL        0           NULL            NULL
18  Tesla Model S   Comfort             smoothRide      NULL        NULL        0           NULL            NULL
19  Tesla Model S   Comfort             seating         NULL        NULL        0           NULL            NULL

Resultset Note:
- there should be only one row per each test-category in the test plan (per car).  (9 items in Tesla S test plan).

QUESTION

  1. What do I need to add to the query to ensure that latestGrade will always be the most recent?
  2. Will UNION ALL cause a performance hit when joining non-plan results? (my real dataset is 10k+ records returned). Is there a way to refactor my query so that it would not need the UNION ALL?
mcmurphy
  • 781
  • 16
  • 30
  • SELECT GRADE ,Max(TEST_DATE) from my_table group by grade – Strawberry Nov 04 '16 at 21:50
  • @Strawberry... not 100% sure I understand. Are you suggesting simply adding a `GROUP BY grade` to my `GROUP BY`? My resultset should have one row per test-plan entry. Adding the group by causes it to have more. – mcmurphy Nov 04 '16 at 22:05
  • 1-4 can be done using simple `MIN()` and `MAX()` grouped by category. 5 requires joining with a query like the ones in http://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column?rq=1. – Barmar Nov 04 '16 at 22:10

1 Answers1

1

Use this "simple" query to get 1-4 and the highest ID for latest grade:

SELECT ti.car_id, ti.category_id, ti.test_id,
    MIN(ti.test_date) as firstRun,
    MAX(ti.test_date) as latest,
    MIN(ti.grade) as worst,
    COUNT(ti.test_date) as testRuns,
    MAX(ti.ID) as max_id -- use later to get latest grade
FROM test_instance ti
GROUP BY ti.car_id, ti.category_id, ti.test_id

Result for your test data:

| CAR_ID | CATEGORY_ID | TEST_ID |   firstRun |     latest | worst | testRuns | max_id |
|--------|-------------|---------|------------|------------|-------|----------|--------|
|      1 |           1 |       1 | 2015-03-14 | 2015-03-17 |     C |        2 |      2 |
|      1 |           1 |       3 | 2015-03-14 | 2015-03-16 |     B |        2 |      7 |
|      1 |           2 |       1 | 2015-03-15 | 2015-03-17 |     B |        3 |      5 |
|      1 |           2 |       8 | 2015-03-13 | 2015-03-13 |     C |        1 |      8 |
|      5 |           1 |       1 | 2015-03-15 | 2015-03-17 |     B |        2 |     10 |
|      5 |           2 |       1 | 2015-03-12 | 2015-03-19 |     A |        2 |     12 |
|      5 |           2 |       7 | 2015-03-14 | 2015-03-18 |     A |        4 |     16 |
|      5 |           5 |       9 | 2015-03-14 | 2015-03-14 |     A |        1 |     17 |

http://sqlfiddle.com/#!9/6df83/4

And join the result with other tables:

SELECT sub.firstRun, sub.latest, sub.worst, sub.testRuns,
    ti.grade as latestGrade,
    c.car_name,
    tc.category_name,
    t.test_name,
    tp.id as test_plan_id
FROM (
    SELECT ti.car_id, ti.category_id, ti.test_id,
        MIN(ti.test_date) as firstRun,
        MAX(ti.test_date) as latest,
        MIN(ti.grade) as worst,
        COUNT(ti.test_date) as testRuns,
        MAX(ti.ID) as max_id -- use later to get latest grade
    FROM test_instance ti
    GROUP BY ti.car_id, ti.category_id, ti.test_id
) sub
JOIN test_instance ti ON ti.id = sub.max_id
JOIN car c            ON c.id  = sub.car_id
JOIN test_category tc ON tc.id = sub.category_id
JOIN test t           ON t.id  = sub.test_id
LEFT JOIN test_plan tp
    ON  tp.car_id      = sub.car_id
    AND tp.category_id = sub.category_id
    AND tp.test_id     = sub.test_id

Result:

|   firstRun |     latest | worst | testRuns | latestGrade |           CAR_NAME | CATEGORY_NAME |    TEST_NAME | test_plan_id |
|------------|------------|-------|----------|-------------|--------------------|---------------|--------------|--------------|
| 2015-03-14 | 2015-03-17 |     C |        2 |           C |    Ford Focus 2006 |         Speed | acceleration |            1 |
| 2015-03-14 | 2015-03-16 |     B |        2 |           C |    Ford Focus 2006 |         Speed |  zeroToSixty |            3 |
| 2015-03-15 | 2015-03-17 |     B |        3 |           B |    Ford Focus 2006 |   Performance | acceleration |            4 |
| 2015-03-13 | 2015-03-13 |     C |        1 |           C |    Ford Focus 2006 |   Performance |     breaking |            6 |
| 2015-03-15 | 2015-03-17 |     B |        2 |           B | Tesla Model S 2016 |         Speed | acceleration |           11 |
| 2015-03-12 | 2015-03-19 |     A |        2 |           A | Tesla Model S 2016 |   Performance | acceleration |           15 |
| 2015-03-14 | 2015-03-18 |     A |        4 |           B | Tesla Model S 2016 |   Performance |     handling |           16 |

http://sqlfiddle.com/#!9/6df83/5

If the test is not "part of" test plan, test_plan_id will be NULL.

You might not even need the test_plan_id so you could remove the that column and the last JOIN.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • +1 for the cool sub-query. That was helpful. The main query is returning too many results. I'm trying to return exactly one row per item in the test_plan table. I've edited the question to add the resultset I'm trying to get to. – mcmurphy Nov 05 '16 at 00:29
  • I also refined the question to focus only on getting data if its from a test on the test plan. I'll ask another question about the UNION_ALL if I need to. – mcmurphy Nov 05 '16 at 00:31
  • You have multiple tests per cathegory. Which `test_name` do you want to select, if you want one row per category? – Paul Spiegel Nov 05 '16 at 01:15