0

I have the following tables:

Users

user_id  course_id  completion_rate
1        2          0.4
1        23         0.6
1        49         0.5
...      ...        ...

Courses

course_id  title
1          Intro to Python
2          Intro to R
...        ...
70         Intro to Flask

Each entry in the user table represents a course that the user took. However, it is rare that users have taken every course.

What I need is a result set with user_id, course_id, completion_rate. In the case that the user has taken the course, the existing completion_rate should be used, but if not then the completion_rate should be set to 0. That is, there would be 70 rows for each user_id, one for each course.

I don't have a lot of experience with SQL, and I'm not sure where to start. Would it be easier to do this in something like R?

Thank you.

2 Answers2

2

You should first cross join the courses with distinct users. Then left join on this to get the desired result. If the user hasn't taken a course the completion_rate would be null and we use coalesce to default a 0.

select c.course_id,cu.user_id,coalesce(u.completion_rate,0) as completion_rate
from courses c
cross join (select distinct user_id from users) cu
left join users u on u.course_id=c.course_id and cu.user_id=u.user_id
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
  • Impressive. More info on cross join can be found [here](http://www.w3resource.com/mysql/advance-query-in-mysql/mysql-cross-join.php) – Blue Apr 21 '17 at 23:30
  • Running this query on the dataset I have was taking too long so I just wrote a Python script to do it. This is a great answer though, thank you! –  Apr 23 '17 at 00:12
0

Step1: Take the distinct client_id from client_data (abc) and do 1 on 1 merge with the course data (abc1) . 1 on 1 merge helps up write all the courses against each client_id

Step2: Merge the above dataset with the client info on client_id as well as course

create table ans as
select p.*,case when q.completion_rate is not null then q.completion_rate else 0   
           end as completion_rate
from
(
   select a.client_id,b.course from
      (select distinct client_id from abc) a
   left join
      abc1 b
   on 1=1
) p
left join
abc q
on p.client_id = q.client_id and p.course = q.course
order by client_id,course;  

Let me know in case of any queries.

G.Arima
  • 1,171
  • 1
  • 6
  • 13
  • U mean half line less? The main question was about 1=1 merge, For rest of query anything will do. – G.Arima Apr 22 '17 at 07:29
  • Yeah that bit's unnecessary too. See other answer. – Strawberry Apr 22 '17 at 07:32
  • Thats the same thing only. Even i liked his style therefore I liked his answer also but you should understand everyone has different style of coding. Tell me if something is wrong with my query. – G.Arima Apr 22 '17 at 07:35
  • Well, in every other respect, I *think* it's identical to the other answer. – Strawberry Apr 22 '17 at 08:00