0

The individual entries in my MySQL 5.7 database enable their revision through PHP: Already saved data record sets can, thus, be actualized. For every single actualization, which is nothing else than saving the same data record set n times, a revision number is auto-incrementally set (it starts with 0):

+----+-----------+-----------+-------------------+----------+
| ID | PatientID | SurgeryID | blahblahblah      | revision |
+----+-----------+-----------+-------------------+----------+
| 1  | 8883      | 7493      | Appendectomy      | 0        |
| 2  | 8883      | 7493      | Appendectomy      | 1        |
| 3  | 8883      | 7493      | Lap. Appendectomy | 2        |
+----+-----------+-----------+-------------------+----------+

The data record set which interests me, of course, is the one with the highest number since it is the latest revision:

+----+-----------+-----------+-------------------+----------+
| ID | PatientID | SurgeryID | blahblahblah      | revision |
+----+-----------+-----------+-------------------+----------+
| 3  | 8883      | 7493      | Lap. Appendectomy | 2        |
+----+-----------+-----------+-------------------+----------+

(Please see that the blahblahblah terms are replaced by numbers (vide infra), I have just termed them here for you to better see the issue.)

How can I COUNT these filtered data record sets?

Done so far:

The SQL query code to filter the highest revision data record set looks like this:

SELECT DISTINCT p.ID, p.PatientID, op.PatID, op.SurgeryID, op.blahblahblah, op.revision
FROM patient_table p
LEFT OUTER JOIN surgery_table op ON op.PatID = p.PatientID
WHERE some restrictions
AND p.PatientID = op.PatID  -- possibly redundant
AND NOT EXISTS (SELECT 1
                  FROM surgery_table op2
                  WHERE op2.PatID = p.PatientID AND op2.revision > op.revision
                 );

This SQL query code works fine and delivers the correct results.

Now I want to COUNT only the various surgery forms performed. My SQL query code so far looks like this:

SELECT COUNT(IF(op.blahblahblah = '0',1,NULL)) 'No Nurgery',
       COUNT(IF(op.blahblahblah = '1',1,NULL)) 'Appendectomy',
       COUNT(IF(op.blahblahblah = '2',1,NULL)) 'Lap. Appendectomy',
       [... lots of surgical procedures listed here ...],
       COUNT(IF(op.blahblahblah = '50',1,NULL)) 'Colostomy',
       COUNT(IF(op.blahblahblah = '99',1,NULL)) 'Different Surgery'
FROM surgery_table op
WHERE op.SurgeryDate BETWEEN "2000-01-01" AND "2020-12-31"

This delivers, of course, ALL data record sets irrespective of their revision numbers:

+-------------------+-----------------------------------------------------+
| blahblahblah      | COUNTs of blahblahblah / no latest revision numbers |
+-------------------+-----------------------------------------------------+
| Appendectomy      | 34579                                               |
| Lap Appendectomy  | 23475                                               |
| ...               | ...                                                 |
| Colostomy         | 3547                                                |
| Different Surgery | 49558                                               |
+-------------------+-----------------------------------------------------+

How can I concatenate or merge or whatever the first and the second SQL query to COUNT merely the data records with the highest revision number?

As a result, I want to have a table like this:

+-------------------+-----------------------------------------------------+
| blahblahblah      | COUNTs of blahblahblah / latest revision numbers    |
+-------------------+-----------------------------------------------------+
| Appendectomy      | 3854                                                |
| Lap Appendectomy  | 1473                                                |
| ...               | ...                                                 |
| Colostomy         | 563                                                 |
| Different Surgery | 2534                                                |
+-------------------+-----------------------------------------------------+
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
mtjmohr
  • 99
  • 9
  • I feel like we've covered this. For a refresher, see: [Why should I provide an MCVE for what seems to me to be a very simple SQL query?](https://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query) – Strawberry Feb 22 '19 at 12:32
  • No, we haven't convered the second part of my question, @Strawberry, we have just checked the filtering for the latest revision number outside of the COUNT query. All required information now exists also according to the MCVE. – mtjmohr Feb 22 '19 at 12:47
  • what does the 1 represent in this query? `SELECT 1 FROM surgery_table op2 WHERE op2.PatID = p.PatientID AND op2.revision > op.revision` – Kebab Programmer Feb 22 '19 at 12:48
  • @KebabProgrammer: Please refer to https://stackoverflow.com/questions/7171041/what-does-it-mean-by-select-1-from-table and https://dba.stackexchange.com/questions/159413/exists-select-1-vs-exists-select-one-or-the-other.. You can replace the "1" with basically anything else. – mtjmohr Feb 22 '19 at 12:52
  • Some Sidenotes: In your first query you are trying to outer join, but `p.PatientID = op.PatID` in your `WHERE` clause renders this a mere inner join. Then you apply `DISTINCT` and I don't have the slightest idea what you want to achieve with this. In your second query you are confusing single and double quotes. Double quotes are for names, single quotes are for literals. – Thorsten Kettner Feb 22 '19 at 13:03
  • Thank you, @ThorstenKettner, for your observations, you are right. I am checking the query after changes ... – mtjmohr Feb 22 '19 at 13:20
  • Maybe it exists, but it isn't here. – Strawberry Feb 22 '19 at 13:39
  • The DISTINCT might be superfluous or even wrong here, but it does not affect the quantitative outcome - the resulting numbers are the same, @ThorstenKettner. I have changed the single and double quotes (not here), accordingly. I will check into your other answers now and comment them there individually. – mtjmohr Feb 22 '19 at 13:47

3 Answers3

1

The last surgery record per patient can be got quite easily with aggregation:

select *
from surgery_table
where (patientid, revision) in
(
  select patientid, max(revision)
  from surgery_table
  where surgerydate between date '2000-01-01' and date '2020-12-31'
  group by patientid
);

And of course you can aggregate these records:

select
  sum(blahblahblah = 0) as no_surgery,
  sum(blahblahblah = 1) as lap_appendectomy,
  ...
from surgery_table
where (patientid, revision) in
(
  select patientid, max(revision)
  from surgery_table
  where surgerydate between date '2000-01-01' and date '2020-12-31'
  group by patientid
);
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Thank you very much, the second SQL query does the job nicely (with the sole exception that I receive 4 records too many but this is something surely grounding in minuscule details I still need to look into). – mtjmohr Feb 22 '19 at 14:21
1

Your second query retrieves one row with many columns (one for each treatment). Your expected result shows one row per treatment instead, which I like much better.

Your database seems to be lacking a treatment table. Create one:

+--------------+-------------------+
|treatment_id  | description       |
+--------------+-------------------+
| 1            | Appendectomy      |
| 2            | Lap. Appendectomy |
+--------------+-------------------+

With your surgery_table

+----+-----------+-----------+--------------+----------+
| ID | PatientID | SurgeryID | treatment_id | revision |
+----+-----------+-----------+--------------+----------+
| 1  | 8883      | 7493      | 1            | 0        |
| 2  | 8883      | 7493      | 1            | 1        |
| 3  | 8883      | 7493      | 2            | 2        |
+----+-----------+-----------+--------------+----------+

you can use something like:

select t.treatment, t.description, count(s.id)
from treatment t
left join surgery_table s using(treatment_id)
where (s.patientid, s.revision) in
(
  select patientid, max(revision)
  from surgery_table
  where surgerydate between date '2000-01-01' and date '2020-12-31'
  group by patientid
)
group by treatment_id
order by treatment_id;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Thank you again, indeed. Yes, such a table exists, I have not displayed it here, though. I will check your proposed SQL query. – mtjmohr Feb 22 '19 at 14:22
  • Transforming this into my proper SQL query works as well. A very elegant approach to handling things. Thank you again, Thorsten. – mtjmohr Feb 22 '19 at 20:09
1

In a query you use the FROM clause (including all joins) to say where you want data from and the WHERE clause which of the data to select.

So if you want to work on the same data, then replace your FROM and WHERE clause. I.e. remove

FROM surgery_table op
WHERE op.SurgeryDate BETWEEN "2000-01-01" AND "2020-12-31"

from your second query and replace it with

FROM patient_table p
LEFT OUTER JOIN surgery_table op ON op.PatID = p.PatientID
WHERE some restrictions
AND p.PatientID = op.PatID  -- possibly redundant
AND NOT EXISTS (SELECT 1
                  FROM surgery_table op2
                  WHERE op2.PatID = p.PatientID AND op2.revision > op.revision
                 );

An alternative is to select from a query:

select blahblahblah, count(*)
from (  <your first query here>  ) q
group by blahblahblah;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Thank you for your kind explicitness. I have understood all of your answers. This one is especially helpful also for other queries I do perform in my system. I really appreciate your help, Thorsten. – mtjmohr Feb 22 '19 at 14:26
  • **THIS APPROACH** is delivering the exact results as to be expected (expectation: I know the absolute number of patients and that 100 % of patients have received surgery). **Excellent**, Thorsten! – mtjmohr Feb 22 '19 at 19:17