0

I want to export some data from the DB.

Basically what I want to say is this:

1- Select mbr_name from the members table

2- Choose the ones that exist at the course_registration table (based on mbr_id)

3- Join the course_registration ids with course_comments table

Then I need to apply these WHERE condtions as well:

1- Make sure that crr_status at course_registration table is set to completed

2- Make sure that crr_ts at course_registration table is between "2021-03-07 00:00:00" AND "2022-03-17 00:00:00"

3- Make sure that crm_confirmation from course_comments table is set to accept

So I tried my best and wrote this:

SELECT members.mbr_name
FROM members
INNER JOIN course_registration AS udt ON members.mbr_id = udt.crr_mbr_id 
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
WHERE udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
AND dot.crm_confirmation = "accept";

But this will give wrong data somehow.

The actual number of members that have all these conditions are 12K but this query gives me 120K results which is obviously wrong!

So what's going wrong here? How can I solve this issue?


UPDATE:

Here are the keys of each table:

members (mbr_id (PK), mbr_name) 
course_registration (crr_id (PK), crr_mbr_id (FK), crr_cor_id (FK), crr_status)
course_comments (crm_id (PK), crm_reference_id (FK), crm_confirmation)
Pouya
  • 114
  • 1
  • 8
  • 36
  • If your wanted result only is from members -table, you could try a subselect instead... – Daniel Stackenland Mar 30 '22 at 11:10
  • You could also have just done a DISTINCT member name and ID to remove the cardinality / duplicates if that is all you wanted. However, O. Jones answer works via the subquery to prequalify. One additional question. What if someone signed up for 3 classes and was NOT accepted in any one of them, or has some pending status? Do you care? Or do you want ALL classes signed up and accepted? Or is this just to check enrollment for the current semester being considered? – DRapp Mar 30 '22 at 12:14
  • 2
    Please in code questions give a [mre]--cut & paste & runnable code & example input; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Mar 30 '22 at 12:54
  • 2
    can you give us an example of what the results should look like and what results you are getting? I feel like you should use a recursion to accomplish what you need – Demeteor Apr 04 '22 at 08:12
  • 1
    The fact that none of the proposed query solutions so far returns your expected result, after all this time, strongly suggests you haven't provided enough information. **What's noticeably missing is a data sample and the expected results (using the sample)**. Descriptions are good - but showing the actual data and desired results is - much better. – SOS Apr 05 '22 at 20:33
  • have you noticed that both `udt.crr_cor_id` and `dot.crm_reference_id` are foreign keys? and you're using the join on them `INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id`, there might be your issue – Antoine Baqain Apr 05 '22 at 23:16
  • Possible duplicate of [Two SQL LEFT JOINS produce incorrect result](https://stackoverflow.com/q/12464037/3404097) – philipxy Apr 06 '22 at 04:03

6 Answers6

2

You have a so-called cardinality problem. JOINs can, when multiple rows on the one table match a single row in the other table, cause the result set to have multiple rows. Your JOIN as written will generate many rows: members x courses x comments. That's what JOIN does.

It looks like you want exactly one row in your resultset for each member who ...

  • has completed one or more courses meeting your criterion.
  • has submitted one or more comments.

So let's start with a subquery. It gives the mbr_id values for members who have submitted one or more comments on one or more courses that meet your criteria.

        SELECT udt.crr_mbr_id
          FROM course_registration udt
          JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
         WHERE udt.crr_status = "completed"
           AND udt.crr_ts >= "2021-03-07 00:00:00"
           AND udt.crr_ts < "2022-03-17 00:00:00"
           AND dot.crm_confirmation = "accept"
         GROUP BY udt.mbr_id

You use the results of that subquery to find your members. The final query is

SELECT members.mbr_name
  FROM members
 WHERE members.mbr_id IN (
        SELECT udt.crr_mbr_id
          FROM course_registration udt
          JOIN course_comments dot ON  udt.crr_cor_id = dot.crm_reference_id
         WHERE udt.crr_status = "completed"
           AND udt.crr_ts >= "2021-03-07 00:00:00"
           AND udt.crr_ts < "2022-03-17 00:00:00"
           AND dot.crm_confirmation = "accept"
         GROUP BY udt.mbr_id )
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Hey dude, thanks for the answering, but the number of results that is shown is only `7808`. However the expected result is `14000` records. And this is because it shows a user only one time EVEN if he had registered in 3 different courses! So how to solve this? – Pouya Apr 03 '22 at 10:38
2

As you only want to select Member name you can try as below if this gives required result

select m.mbr_name
  from Members m
  where Exists ( select 1 from Course_Registration cr 
                               join Course_Comments cm on cr.crr_cor_id = cm.crm_reference_id
                  where cr.crr_mbr_id = m.mbr_id
                    And cr.crr_status = "completed" AND cr.crr_ts >= "2021-03-07 00:00:00" AND cr.crr_ts < "2022-03-17 00:00:00"
                    AND cr.crm_confirmation = "accept";
                 );


Ramesh
  • 438
  • 4
  • 13
1

My first guess, without knowing the context, is that:

  • a member can register to one or more courses,
  • each course can have one or more comments.

If this is the case, you are getting way more tuples due to redundancy. In that case you just need to stick a DISTINCT right after your first SELECT.

Furthermore, since the JOIN is the most resource-expensive operation in sql, I would first filter the data and then leave any join as the last operation to improve efficiency. Something like this:

SELECT 
    members.mbr_name 
FROM
    (
    SELECT DISTINCT
        crm_reference_id
    FROM 
        course_comments
    WHERE 
        crm_confirmation = 'accept'
    ) accepted_comments
INNER JOIN 
    (
    SELECT DISTINCT
        crr_mbr_id,
        crr_cor_id
    FROM 
        course_registration
    WHERE
        crr_status = 'completed'
    AND
        crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
    ) completed_courses 
ON 
    accepted_comments.crm_reference_id = completed_courses.crr_cor_id
INNER JOIN 
    members 
ON 
    members.mbr_id = completed_courses.crr_mbr_id
lemon
  • 14,875
  • 6
  • 18
  • 38
1

I would start at the registration FIRST instead of the members. By getting a DISTINCT list of members signing up for a course, you have a smaller subset. From that too, joining to the comments for just those accepted gives you a final list.

Once you have those two, join back to members to get the name. I included the member ID as well as the name because what if you have two or more "John" or "Karen" names in the registration. At least you have the ID that confirms the unique students.

select
        m.mbr_name,
        m.mbr_id
    from
        ( select distinct
                cr.crr_mbr_id
            from
                course_registration cr
                    JOIN course_comments cc 
                        ON cr.crr_cor_id = cc.crm_reference_id
                        AND cc.crm_confirmation = 'accept'
            WHERE 
                    cr.crr_status = 'completed'
                AND cr.crr_ts >= '2021-03-07' 
                AND cr.crr_ts < '2022-03-17' ) PQ
        JOIN members m
            ON PQ.crr_mbr_id = m.mbr_id 
DRapp
  • 47,638
  • 12
  • 72
  • 142
0

Try using this and if it not works then try using 'between' for date field (crr_ts).

select mbr.mbr_name from
(
select * from course_registration AS udt
INNER JOIN course_comments AS dot ON udt.crr_cor_id = dot.crm_reference_id
where dot.crm_confirmation = "accept" AND udt.crr_status = "completed" AND udt.crr_ts >= "2021-03-07 00:00:00" AND udt.crr_ts < "2022-03-17 00:00:00"
)x
INNER JOIN  members mbr on mbr.mbr_id = x.crr_mbr_id
0

Try this:

SELECT *
FROM members M
INNER JOIN course_registration CR
ON CR.crr_mbr_id = M.mbr_id
AND CR.crr_status = 'completed'
AND CR.crr_ts BETWEEN '2021-03-07 00:00:00' AND '2022-03-17 00:00:00'
WHERE EXISTS(
    SELECT * FROM course_comments CC
    WHERE CC.crm_confirmation = 'accept'
    AND CC.crm_reference_id = CR.crr_cor_id
)
ORDER BY M.mbr_id;
memite7760
  • 69
  • 9