Say I have a simplified model in which a patient
can have zero or more events
. An event has a category
and a date
. I want to support questions like:
Find all patients that were given a medication after an operation and
the operation happened after an admission.
Where medication, operation and admission are all types of event categories. There are ~100 possible categories.
I'm expecting 1000s of patients and every patient has ~10 events per category.
The naive solution I came up with was to have two tables, a patient
and an event
table. Create an index on event.category
and then query using inner-joins like:
SELECT COUNT(DISTINCT(patient.id)) FROM patient
INNER JOIN event AS medication
ON medication.patient_id = patient.id
AND medication.category = 'medication'
INNER JOIN event AS operation
ON operation.patient_id = patient.id
AND operation.category = 'operation'
INNER JOIN event AS admission
ON admission.patient_id = patient.id
AND admission.category = 'admission'
WHERE medication.date > operation.date
AND operation.date > admission.date;
However this solution does not scale well as more categories/filters are added. With 1,000 patients and 45,000 events I see the following performance behaviour:
| number of inner joins | approx. query response |
| --------------------- | ---------------------- |
| 2 | 100ms |
| 3 | 500ms |
| 4 | 2000ms |
| 5 | 8000ms |
Does anyone have any suggestions on how to optimize this query/data model?
Extra info:
- Postgres 10.6
- In the Explain output,
project_result
is equivalent topatient
in the simplified model.
Advanced use case:
Find all patients that were given a medication within 30 days after an
operation and the operation happened within 7 days after an admission.