3

The image shows the structure of my table. The first line means tutorB gives 10 marks to studentD. The second line means tutorE does not give any marks to studentD yet.

enter image description here

How can I generate the following table? I have referenced another post in stackoverflow.com. Collaborative filtering in MySQL? Yet, I am still quite confused.

enter image description here

From the image shown above, o means recommended, which the rate is higher than or equal to 7; x means not recommended, which the rate is less than 7.

For example, the tutorB give studentD 10 marks, therefore, from the second line in the image, we can see there is a "o" in column StudentD. ( And other three rows's data are just randomly assigned now.)

Now, if I want to recommend a student for Tutor A. The ranks ( or similarity) of TutorB, C and D are 0,2 and 3 respectively.

How can I generate a SQL such that I can able to convert the rate to "o" and "x" and calculate the rank. And, the most important, I want to recommend StudentH to TutorA as from the image.

How should I modify the code from the previous post? And, if my idea mentioned above correct?

Thanks.

============================================================================

EDITED

I have the following data in the database. The first row means 10 marks is given by tutorA to studentC.

enter image description here

I convert it as another table for a better understanding. v is the value of Rate.

enter image description here

create temporary table ub_rank as 
select similar.NameA,count(*) rank
from tbl_rating target 
join tbl_rating similar on target.NameB= similar.NameB and target.NameA != similar.NameA
where target.NameA = "tutorA"
group by similar.NameA;

select similar.NameB, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.NameA = similar.NameA 
left join ub target on target.NameA = "tutorA" and target.NameB = similar.NameB
where target.NameB is null
group by similar.NameB
order by total_rank desc;

select * from ub_rank;

The code above is referenced from Collaborative filtering in MySQL?. I have a few questions.

  1. There are 2 parts in the SQL. I can select * from the first part. However, if I enter the whole SQL as shown above, the system alerts Table 'mydatabase.ub' doesn't exist How should I modify the code?

  2. The code will find the similarity. How should I change the code, such that if the marks are less that 7, it changes to o, else change to v , and count the similarity of a given user?

enter image description here

Community
  • 1
  • 1
HUNG
  • 525
  • 7
  • 17
  • What does the last line in your table mean? Is that the "rating" given by StudentD to TutorB? Also, how many students are in your table? Do you really want a column for every existing student? – BellevueBob Mar 24 '13 at 21:57
  • @BellevueBob Yes, it means the "rating" given by StudentD to TutorB. The second image above shows Recommending Students part, which I used the data similar with those in first 2 lines of first image. On the other hand, when I perform the Recommendating Tutors part, I will use the data from the third to the last row of the DB table data. For the number of students, I have not yet consider well. I think I will try to figure out the table first and try to limit the size later. – HUNG Mar 24 '13 at 22:05

3 Answers3

1

Your DB schema is actually not very easy to work with.

Here's a query to get an exhaustive rating table:

SELECT Tutor.Name, Student.Name, 
  CASE WHEN Rating.Rate IS NULL THEN ''
    WHEN Rating.Rate > 6 THEN 'o'
    ELSE 'x' END
FROM (
    SELECT DISTINCT NameB AS Name
    FROM tbl_rating 
    WHERE RoleB='Tutor'
    UNION
    SELECT DISTINCT NameA AS Name
    FROM tbl_rating
    WHERE RoleA='Tutor'
    ORDER BY Name) AS Tutor
CROSS JOIN (
    SELECT DISTINCT NameB AS Name
    FROM tbl_rating 
    WHERE RoleB='Student'
    UNION
    SELECT DISTINCT NameA AS Name
    FROM tbl_rating
    WHERE RoleA='Student'
    ORDER BY Name) AS Student
LEFT JOIN tbl_rating AS Rating
ON Tutor.Name = Rating.NameA
AND Student.Name = Rating.NameB
ORDER BY Tutor.Name, Student.Name

The above query works by extracting from the table the list of all tutors (first subquery aliased to Tutor), and the list of all students (second subquery Student), do a product of both sets to obtain all the possible combination of tutor and student. Then it does an outer join with the rating table, which associate finds all the ratings done by students on tutors, and fill in with NULL non existent ratings. (The query to obtain the opposit rating - ie. student rating by tutors - can be obtained by swapping NameA and NameB in the LEFT JOIN clauses). The CASE turns numerical (or null) ratings to symbols as requested.

For similarities, we need to add two more joins:

  • one more on Tutor,
  • and another one on Rating

thus giving:

SELECT T1.Name AS Tutor1 , T2.Name AS Tutor2, 
  SUM( CASE
    WHEN (R1.Rate > 6 && R2.Rate > 6) || 
                (R1.Rate < 7 && R2.Rate < 7) THEN 1
    ELSE 0 END) AS SIMILARITY
FROM (
    SELECT DISTINCT NameB AS Name
    FROM tbl_rating 
    WHERE RoleB='Tutor'
    UNION
    SELECT DISTINCT NameA AS Name
    FROM tbl_rating
    WHERE RoleA='Tutor'
    ORDER BY Name) AS T1
CROSS JOIN (
    SELECT DISTINCT NameB AS Name
    FROM tbl_rating 
    WHERE RoleB='Tutor'
    UNION
    SELECT DISTINCT NameA AS Name
    FROM tbl_rating
    WHERE RoleA='Tutor'
    ORDER BY Name) AS T2
CROSS JOIN (
    SELECT DISTINCT NameB AS Name
    FROM tbl_rating 
    WHERE RoleB='Student'
    UNION
    SELECT DISTINCT NameA AS Name
    FROM tbl_rating
    WHERE RoleA='Student'
    ORDER BY Name) AS Student
LEFT JOIN tbl_rating AS R1
ON T1.Name = R1.NameA
AND Student.Name = R1.NameB
LEFT JOIN tbl_rating AS R2
ON T2.Name = R2.NameA
AND Student.Name = R2.NameB
WHERE Tutor1 < Tutor2
GROUP BY Tutor1, Tutor2
ORDER BY Tutor1, Tutor2

You could make these queries much more efficient by extracting the students and tutors specific data in their own tables, split the rating table in student ratings and tutors ratings, and use foreign keys:

Table student : Id | Name
Table tutor: Id  | Name
Table tutor_rating: StudentId | TutorId | Rate
Table student_rating: StudentId | TutorId | Rate

and possibly a tutor_similiarity table to avoid recomputing the whole dataset all the time, with a couple of triggers on the rating tables to update it (the similarity computation would be then incremental, and queries would just dump its content).

Table tutor_similarity: TutorId1 | TutorId2 | Similarity
didierc
  • 14,572
  • 3
  • 32
  • 52
  • I think there's something not quite right with my answer, regarding the JOIN somewhere, but I can't pinpoint the issue without testing it. – didierc Mar 24 '13 at 23:11
  • no need to use "similar" as mentioned in http://stackoverflow.com/questions/2440826/collaborative-filtering-in-mysql ? actually, this method fits my solution very much, but I have some extra conditions changed above, therefore , not sure how to modify it. – HUNG Mar 24 '13 at 23:13
  • sorry, my answer isn't correct, I'll try tomorrow. – didierc Mar 24 '13 at 23:44
1

Shamelessly borrowing from the answer to this previous question, see if this does the trick:

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'max(case when NameB = ''',
      NameB,
      ''' then (case when rate >= 7 then ''x'' else ''o'' end) else '' '' end) AS ',
      replace(NameB, ' ', '')
    )
  ) INTO @sql
from tbl_rating
where RoleA = 'Tutor';

SET @sql = CONCAT('SELECT NameA, ', @sql, 
' from tbl_rating
where RoleA = ''Tutor''
group by NameA');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Here is a SQL Fiddle.

Community
  • 1
  • 1
BellevueBob
  • 9,498
  • 5
  • 29
  • 56
0

This is really a comment but it is too long for a comment.

First, you cannot easily create a table with a variable number of columns. Do you know the columns in advance? In general, you represent a matrix the way you do in your original table . . . the "x" and "y" values are columns and the value goes in a third column.

Second, is the x and o based on the rating from the tutor to the student or vice versa? Your question is entirely ambiguous.

Third, to convert a rating to an "x" or "o", just use a case statement:

select (case when rating >= 7 then 'x' else 'o' end)

Fourth, you say the similarities from A to B, C, and D are 0, 2, and 3 respectively. I have no idea how you are getting this from the matrix that you show. If it is by overlap of "x"s, then the values would seem to be 0, 1, and 2.

My final conclusion is that you don't need to create a matrix like that at all because you already have the data in the correct format.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786