Let's assume we have a database like this:
Project_tbl
:
----------------- id | Project_name ----------------- 1 | A 2 | B 3 | C -----------------
personel_project_tbl
:
-------------------- user_id | Project_id -------------------- 1 | 1 2 | 2 3 | 1 3 | 2 2 | 3 --------------------
instrument_project_tbl
:
-------------------------- instrument_id | Project_id -------------------------- 1 | 1 1 | 2 2 | 2 2 | 1 1 | 3 --------------------------
Now, I need to sort the list of projects and rank them with regard to their similarity to the project A.
For example:
A and B have 1 users in common over the 3 users and 2 instruments over the 2 instrument so their similarity ranking is (1/2 + 2/2) / 2 = 75%
A and C have no user in common but have 1 over 2 instruments so it will be (1/2)/2 = 25%
So B is more similar than be and output should be
-------------- Project | Rank -------------- 2 | 75 3 | 25
That's the first solution came to my mind...
If I did it in PHP and MySQL, it would be something like:
for all tables as table_x
for all projects (except A) as prj_y
unique = (Select distinct count(items) from table_x where project is A)
count += (Select distinct count(items) from table_x
where project is prj_x and items are in
(select distinct items from table_x where project is a)
)/unique
So the complexity would be O(n2) and with indexing the select also would cost O(log n) which wouldn't be affordable.
Do you have any idea to do it totally in MySQL or do it in a better and faster way?
******** More information and notes:**
I'm limited to PHP and MySQL.
This is just an example, in my real project the tables are more than 20 tables so the solution should have high performance.
this question is the supplementary question for this one : Get the most repeated similar fields in MySQL database if yr solution can be used or applied in a way for both of them (somehow) It would be more than great. I want to multiply the value of related projects with the similarity of items to get the best option...
In conclusion, these two questions will : get the most related projects, get the similar items of all projects and find the most similar item for current project where the project is also similar to the current one! yo
Thanks for your intellectual answers, its really appreciated if you could shed some light on the situations