I am trying to improve the performance of a query using a "materialized view" to optimize away joins. The first query below is the original, which employs joins. The second is the query written against a table i generated which includes all the joined data (the equivalent of a materialized view). They both return the same result set. Unfortunalatey, somehow, the second query is MUCH slower when handling a very long set of input ids (the IN clause). I don't understand how that could be!!!! Executing all the joins has to have a fair amount of overheat that is saved by the "materialized view", right?
SELECT
clinical_sample.INTERNAL_ID AS "internalId",
sample.STABLE_ID AS "sampleId",
patient.STABLE_ID AS "patientId",
clinical_sample.ATTR_ID AS "attrId",
cancer_study.CANCER_STUDY_IDENTIFIER AS "studyId",
clinical_sample.ATTR_VALUE AS "attrValue"
FROM clinical_sample
INNER JOIN sample ON clinical_sample.INTERNAL_ID = sample.INTERNAL_ID
INNER JOIN patient ON sample.PATIENT_ID = patient.INTERNAL_ID
INNER JOIN cancer_study ON patient.CANCER_STUDY_ID =
cancer_study.CANCER_STUDY_ID
WHERE cancer_study.CANCER_STUDY_IDENTIFIER = 'xxxxx'
AND sample.STABLE_ID IN
('P-0068343-T02-IM7' , 'P-0068353-T01-IM7' ,
'P-0068363-T01-IM7' , 'P-0068364-T01-IM7' )
AND clinical_sample.ATTR_ID IN
(
'CANCER_TYPE'
);
SELECT
internalId,
sampleId,
patientId,
attrId,
studyId,
attrValue
FROM test
WHERE
sampleId IN ('P-0068343-T02-IM7' , 'P-0068353-T01-IM7' ,
'P-0068363-T01-IM7' , 'P-0068364-T01-IM7' )
AND studyId = 'xxxxx'
AND attrId = 'CANCER_TYPE';
Update: I did notice in Workbench report that the query with joins seems to scan far fewer rows. About 829k vs ~2400k for the second, joinless query. So having joins seems to actually be a major optimization somehow. I have index in sampleId, studyId, attrId and composite of all three.
Both table "test" and "clinical_sample" have the same number of rows.