THE DATA
So, lets say I have a application where I'm testing cars for Speed, Performance, Safety, etc. I have a set of data in the following format:
CAR TABLE
ID CAR_NAME
1 Ford Focus 2006
2 Toyota Corolla 2014
5 Tesla Model S 2016
TEST TABLE
ID TEST_NAME
1 acceleration
2 topSpeed
3 zeroToSixty
4 smoothRide
5 seating
6 ergonomics
7 handling
8 breaking
9 audio
TEST_CATEGORY TABLE
ID CATEGORY_NAME
1 Speed
2 Performance
3 Comfort
4 Safety
TEST_PLAN TABLE
ID CAR_ID CATEGORY_ID TEST_ID
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
5 1 2 3
6 1 2 8
7 1 2 7
8 1 4 8
9 1 4 5
11 5 1 1
12 5 1 2
13 5 1 3
14 5 1 7
15 5 2 1
16 5 2 7
17 5 2 4
18 5 3 4
19 5 3 5
TEST_INSTANCE TABLE
ID CAR_ID TEST_ID CATEGORY_ID GRADE TEST_DATE
1 1 1 1 C 2015-03-14
2 1 1 1 C 2015-03-17
3 1 1 2 D 2015-03-15
4 1 1 2 C 2015-03-16
5 1 1 2 B 2015-03-17
6 1 3 1 B 2015-03-14
7 1 3 1 C 2015-03-16
8 1 8 2 C 2015-03-13
9 5 1 1 B 2015-03-15
10 5 1 1 B 2015-03-17
11 5 1 2 A 2015-03-19
12 5 1 2 A 2015-03-12
13 5 7 2 B 2015-03-14
14 5 7 2 B 2015-03-16
15 5 7 2 A 2015-03-18
16 5 7 2 B 2015-03-15
17 5 9 5 A 2015-03-14 --note that this test is not part of the test plan
So there's 5 tables, but only the last two (TEST_PLAN, TEST_INSTANCE) are important. The rest are just JOIN dependencies.
TEST_PLAN Table
The TEST_PLAN table defines a series of tests that need to be conducted against each car. Each test can be in more than one category. The TEST_PLAN table shows what tests need to be conducted and in what category. An acceleration test under the speed category, does not count for an acceleration test under performance.
The test plans look a bit like this:
Ford Focus 2006
Speed
acceleration
topSpeed
zeroToSixty
Performance
acceleration
zeroToSixty
breaking
handling
Safety
breaking
seating
Tesla Model S 2016
Speed
acceleration
topSpeed
zeroToSixty
handling
Performance
acceleration
handling
smoothRide
Comfort
smoothRide
seating
TEST_INSTANCE Table
The TEST_INSTANCE table tracks which tests (under which category) have been conducted and when. A given test-category may have been conducted for a given car more than once. Tests that are under no test plan may also have been conducted.
The TEST_INSTANCE table (all JOINed up) looks like this:
ID CAR TEST_NAME TEST_CATEGORY GRADE TEST_DATE
1 Ford Focus 2006 acceleration Speed C 2015-03-14
2 Ford Focus 2006 acceleration Speed C 2015-03-17
3 Ford Focus 2006 acceleration Performance D 2015-03-15
4 Ford Focus 2006 acceleration Performance C 2015-03-16
5 Ford Focus 2006 acceleration Performance B 2015-03-17
6 Ford Focus 2006 zeroToSixty Speed B 2015-03-14
7 Ford Focus 2006 zeroToSixty Speed C 2015-03-16
8 Ford Focus 2006 breaking Performance C 2015-03-13
9 Tesla Model S 2016 acceleration Speed B 2015-03-17
10 Tesla Model S 2016 acceleration Speed B 2015-03-15
11 Tesla Model S 2016 acceleration Performance A 2015-03-16
12 Tesla Model S 2016 acceleration Performance A 2015-03-14
13 Tesla Model S 2016 handling Performance B 2015-03-19
14 Tesla Model S 2016 handling Performance B 2015-03-12
15 Tesla Model S 2016 handling Performance A 2015-03-13
16 Tesla Model S 2016 handling Performance B 2015-03-18
17 Tesla Model S 2016 zeroToSisty Speed A 2015-03-14 --note that this test is not part of the test plan
What I'm Trying to do
I need to do several things involving aggregation. I'm trying to get a list of all tests in a given test plan (for a single car) and JOIN the folloing from test instance:
The date the given test was first run in a given category.
The most recent date a given test was run in a category.
The total number of times a test was run in a given category.
Worst result for a given test-category
Most recent result for a given test-category.
I'm using the following query, and I've figured out how to get 1-4. I'm still having trouble on 5:
SELECT
tp.id,
c.car_name
tc.category_name,
t.test_name,
MIN(ti.test_date) as firstRun, -- this grabs the date of the first time the test-category ran
MAX(ti.test_date) as latest, -- this grabs the latest date the test-category ran
MIN(ti.grade) as worst, -- this grabs the worst result fron the test-category
COUNT(ti.test_date) as testRuns, -- this grabs the total number of tests run from the category.
ti.grade as latestGrade -- this seems to give me the most recent grade from a test-category, but I can't be sure.
FROM
testPlan tp
LEFT JOIN car c ON c.id = tp.car_id
LEFT JOIN test_category tc ON tc.id = tp.category_id
LEFT JOIN test t ON t.id = tp.test_id
LEFT JOIN test_instance ti ON (
ti.car_id = tp.car_id
AND ti.category_id = tp.category_id
AND ti.test_id = tp.test_id
)
WHERE
tp.car_id = 5
GROUP BY
tp.id;
For item number 5, ti.grade as latestGrade
seems to give me the most recent grade for a given test-category, but I have no way of being assured of that. I think I need some sort of ORDER BY
to be certain.
EXPECTED RESULTS
The final query should return something like this:
ID CAR TEST_CATEGORY TEST_NAME FIRST_RUN LATEST TESTS_RUN LATEST_GRADE WORST_GRADE
11 Tesla Model S Speed acceleration 2015-03-15 2015-03-17 2 B B
12 Tesla Model S Speed topSpeed NULL NULL 0 NULL NULL
13 Tesla Model S Speed zeroToSixty NULL NULL 0 NULL NULL
14 Tesla Model S Speed handling NULL NULL 0 NULL NULL
15 Tesla Model S Performance acceleration 2015-03-14 2015-03-16 2 A A
16 Tesla Model S Performance handling 2015-03-12 2015-03-19 4 A B
17 Tesla Model S Performance smoothRide NULL NULL 0 NULL NULL
18 Tesla Model S Comfort smoothRide NULL NULL 0 NULL NULL
19 Tesla Model S Comfort seating NULL NULL 0 NULL NULL
Resultset Note:
- there should be only one row per each test-category in the test plan (per car). (9 items in Tesla S test plan).
QUESTION
- What do I need to add to the query to ensure that latestGrade will always be the most recent?
- Will
UNION ALL
cause a performance hit when joining non-plan results? (my real dataset is 10k+ records returned). Is there a way to refactor my query so that it would not need theUNION ALL
?