-2

I am able to pull out two lists from my tables. One shows all the units for each student in each cohort. Another shows if all the parts of each unit have been submitted, for a particular student's work in a specific cohort. I want to join the lists, so that I can see who has submitted (or not) each part of each unit, for each student in each cohort.

cohort_units:
cohort_id    unit    part
235          ABC     A
235          ABC     B
246          DEF     A
246          DEF     B
246          DEF     C

cohort_students:
user_id      cohort_id
21           235
24           235
43           235
53           246

assignments:
user_id    cohort_id    unit     draft1recdt 
21         235          ABCA     2023-01-03
21         235          ABCB     NULL
24         235          ABCA     2023-02-01
24         235          ABCB     2023-02-02  

This pulls a list of units with the user id and cohort id.

SELECT cohort_students.user_id,
       cohort_units.unit,
       cohort_units.cohort_id 
FROM cohort_units 
LEFT JOIN cohort_students 
    ON cohort_units.cohort_id = cohort_students.cohort_id 
GROUP BY cohort_units.unit,cohort_students.user_id 
ORDER BY cohort_students.user_id;

result:
user_id    unit    cohort_id
21         ABC    235
24         ABC    235
43         ABC    235
53         DEF    236

This returns a row IF there are more parts to an assignment than parts that have been submitted, for each unit that each student in each cohort should have completed given the cohort id, user id and unit name.

SELECT GROUP_CONCAT(CASE WHEN draft1recdt IS NOT NULL THEN draft1recdt END) AS drafts,
       (LENGTH(GROUP_CONCAT(DISTINCT draft1recdt))-LENGTH(REPLACE(GROUP_CONCAT(DISTINCT draft1recdt), ',', '')))+1 as numDrafts,
       cohort_units.unit,
       GROUP_CONCAT(cohort_units.part) as parts,
       (LENGTH(GROUP_CONCAT(DISTINCT cohort_units.part))-LENGTH(REPLACE(GROUP_CONCAT(DISTINCT cohort_units.part), ',', '')))+1 as numParts 
FROM assignments 
LEFT JOIN cohort_units 
    ON assignments.cohort_id = cohort_units.cohort_id 
    AND assignments.unit = CONCAT(cohort_units.unit,cohort_units.part) 
WHERE assignments.cohort_id = 235 
    AND cohort_units.unit = 'ABC' AND assignments.user_id = 21 
GROUP BY cohort_units.unit 
HAVING numParts > numDrafts;

How do I make the second select statement part of the first, using the three columns on the first select statement as the joining information?

I want to run the second query on every result from the first query. Using the data above, I would expect to pull out user id 21 as they have only submitted one part of a two part unit.

user_id    unit   cohort_id   parts   numParts   numDrafts
21         ABC    235         A,B     2          1

Is this a JOIN? Or a SUBQUERY?

Dan
  • 1,154
  • 1
  • 7
  • 14
  • 1
    Please clarify via edits, not comments. Please ask 1 (clear specific researched non-duplicate) question. Debug questions require a [mre]. Please avoid social & meta commentary in posts. [ask] [Help] "splice" means nothing in particular. – philipxy Feb 17 '23 at 11:25
  • 1
    "merge" doesn't mean anything more than "splice"--both are general & vague & tell us nothing. [mre] – philipxy Feb 17 '23 at 11:31
  • A [mre] includes cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] PS When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Feb 17 '23 at 11:40
  • When giving a business relation(ship)/association or table (base or query result), say what a row in it states about the business situation in terms of its column values. [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Feb 17 '23 at 11:41
  • 2
    It would help a lot if you [edit]ed your question to show us the output you require from the sample data you've shown us. – O. Jones Feb 17 '23 at 11:43
  • "join" is an operator and now in the title you are using it to mean something else generic & vague that you don't even actually think might be a "join". – philipxy Feb 17 '23 at 11:51
  • 1
    Here's a fiddle with the data https://dbfiddle.uk/xNgYefHA And @philipxy, with respect, people sometimes ask questions here because they don't completely understand their problem. – O. Jones Feb 17 '23 at 11:52
  • That's because I DON'T KNOW how to do this! I don't know what the terminology is. – Dan Feb 17 '23 at 11:52
  • It's not a matter of terminology. It's a matter of actually trying to say what you mean & of not saying what you know you don't mean. Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. Please act on the comments. – philipxy Feb 17 '23 at 11:56
  • I have updated the question, and the fiddle to show what I want to pull. https://dbfiddle.uk/rmQnYIWi - I want a row in the second results table for each of the rows in the first. – Dan Feb 17 '23 at 12:12
  • LEFT JOIN ON returns INNER JOIN ON rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN ON you want as part of an OUTER JOIN ON. After a LEFT JOIN ON a WHERE, INNER JOIN ON or HAVING that requires a right [sic] table column to be not NULL removes any rows with introduced NULLs, ie leaves only INNER JOIN ON rows, ie "turns OUTER JOIN into INNER JOIN". You have that. – philipxy Feb 17 '23 at 17:30
  • Please put everything needed to ask your question in your post, not just at a link or comment. "tabular initialization code" "Please clarify via edits, not comments." PS See [How do comment replies work?](https://meta.stackexchange.com/q/43019/266284) to learn to use `@x` to notify 1 non-sole non-poster commenter `x` per comment about that comment. Posters, sole commenters & followers of posts always get notified. Without `@` other commenters get no notification. – philipxy Feb 17 '23 at 17:38

1 Answers1

1

(For what it's worth, I believe cohort is an accepted term in various social science disciplines.)

Your problem would be made easier if your assignments table had a part column matching the part table in your cohort_units table. So let's start with a subquery to generate a virtual table with that column present.

SELECT assignments.user_id, assignments.cohort_id, 
       cohort_units.unit, cohort_units.part,
       assignments.draft1recdt
  FROM assignments
  JOIN cohort_units 
     ON assignments.unit = CONCAT(cohort_units.unit, cohort_units.part)

We'll use this subquery in place of assignments moving forward. This is a bit of a kludge, but it will make later work cleaner.

Next we need the number of parts in each unit. That's a simple aggregate:

SELECT COUNT(*) num_parts,
       cohort_id,
       unit
  FROM cohort_units
 GROUP BY cohort_id, unit

We can organize our query using common table expressions, like so.

WITH completed AS (
 SELECT assignments.user_id, assignments.cohort_id, 
       cohort_units.unit, cohort_units.part,
       assignments.draft1recdt
  FROM assignments
  JOIN cohort_units 

      ON assignments.unit = CONCAT(cohort_units.unit, cohort_units.part) 
),
partcount AS (
SELECT COUNT(*) num_parts,
       cohort_id,
       unit
  FROM cohort_units
 GROUP BY cohort_id, unit
)
SELECT completed.user_id, cohort_units.cohort_id, cohort_units.unit, 
       GROUP_CONCAT(completed.part) parts,
       COUNT(*) completed_parts,
       partcount.num_parts
  FROM cohort_units
  JOIN partcount
         ON cohort_units.cohort_id = partcount.cohort_id
        AND cohort_units.unit = partcount.unit
  JOIN completed
         ON completed.cohort_id = cohort_units.cohort_id
        AND completed.unit = cohort_units.unit
        AND completed.part = cohort_units.part
GROUP BY completed.user_id, cohort_units.cohort_id, cohort_units.unit, num_parts
HAVING COUNT(*) < partcount.num_parts

Here's a fiddle. https://dbfiddle.uk/FvGkiAnl

One of the tricks to this is the separate aggregate to get the part counts.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • You can put a `WHERE` clause right before the last `GROUP BY` clause, maybe `WHERE cohort_id <= 100` or something. That way you can get partial results. But you probably need some indexes on your tables to make this run faster. Take a look at [tag:query-optimization]. If I were you I'd optimize the two WITH queries first. – O. Jones Feb 17 '23 at 17:20
  • And, I should add that if you can rework that `assignments` table so it has a separate `part` column rather than concatenating the `unit` and `part` it probably will be much easier to speed things up with indexes. – O. Jones Feb 17 '23 at 17:37
  • Thanks again, I got the unit and part separated in the assignments table using SUBSTRING(assignid,1,CHAR_LENGTH(assignid)-1) AS unit, RIGHT(assignid,1) as part since the assignid field is just a concatenation of the unit and part (the part always being one letter). The query now runs as it should and pulls 3700 results which is exactly what I need. You sir are a Godsend and I appreciate your help and support. – Dan Feb 18 '23 at 09:54