Background:
First, I have a schema with a table named patients
with which I am concerned with patient_id
and alerts
(alerts is a string of characters, where each character represents some arbitrary value/meaning). Second, each 'patient' is part of a group [family] based solely on the first 6 digits on their patient_id. Also, there are a few third parties that rely on this database; I did not design nor can I change this schema/datamodel and I cannot migrate away from MySQL.
Here is a fiddle with the data model
Challenge:
Now, I need to find occurrences where patients have alerts containing !
, @
, #
, %
, ^
, or &
symbols and their family members do not. My first thought was to gather all patients that have alerts containing these symbols, remove the last digit in each patient_id and then group by this value. Now I have a list of (for all intents and purposes) 'group_ids.' Finally, I need to extend the list to contain each of the groups family members and their respective alerts strings.
Here is what I have so far:
Query #1:
SELECT p.patient_id, p.name_first, p.name_last, p.alerts
FROM patients p
INNER JOIN (SELECT SUBSTRING(patient_id, 1, CHAR_LENGTH(patient_id) - 1) AS group_id
FROM patients
WHERE patient_id BETWEEN 1000000 AND 7999999
AND (alerts like '%!%'
OR alerts like '%@%'
OR alerts like '%#%'
OR alerts like '%\%%'
OR alerts like '%^%'
OR alerts like '%&%')
GROUP BY group_id) g
ON p.patient_id LIKE CONCAT(g.group_id, '%')
ORDER BY p.patient_id
LIMIT 30000;
Fiddle ~ NOTICE: fiddle is not an accurate representation of the problem as included table only has 28 records.
Recordset: 80,000 ~ Results: 2188 ~ Duration: 14.321 sec ~ Fetch: 0.00 sec ~ Total: 14.321 sec
Query #2:
SELECT p.patient_id, p.name_first, p.name_last, p.alerts
FROM patients p
JOIN (SELECT DISTINCT LEFT(patient_id, 6) AS group_id
FROM patients
WHERE patient_id BETWEEN 1000000 AND 7999999
AND alerts REGEXP '[!@#%^&]') g
ON p.patient_id LIKE CONCAT(g.group_id, '%')
ORDER BY p.patient_id
LIMIT 30000;
Fiddle ~ NOTICE: fiddle is not an accurate representation of the problem as included table only has 28 records.
Recordset: 80,000 ~ Results: 2188 ~ Duration: 4.259 sec ~ Fetch: 5.663 sec ~ Total: 9.992 sec
EDIT: after adding name_first, name_last, alerts and order by clause, I found this query took exactly the same amount of time as the first.
Problem:
The list that I get back is accurate but, not only will it take extra processing (I plan to do this with PHP), but also it takes 14 seconds!
If anyone has a better... or can at least point in the direction of a better and more efficient solution, please enlighten me. Thanks in advance.
Extra Credit: any tips on the PHP algo to solve the aforementioned problem with the given data - forget semantics, just a formula will do.