1

Currently i am working on a project where i need to match people based on the categories of food they like:

This is the scenario:

I have a list of USERS and their favorite foods in my database. The database structure is as follows:

USERS(id,name,email,gender,dob)
Fav_Food (id,user_name,food,desc)

Data for users table:

1, Alice, alice@lala.com, female, 11 Oct 2010

2, Bob, bob@lala.com, male, 12 Oct 2010

3, Jason, jason@lala.com, male, 13 Oct 2010

Data for fav_foods table:

1, Alice, apple, some desc

2, Alice, banana, some desc

3, Alice, Pear, some desc

4, Bob, apple, some desc

5, Bob, custard cake, some desc

6, Jason, banana,some desc

6,Jason,apple,some desc

Imagine that i am Alice where i like apple,banana & pear. How would i be able to match people based on the favorite food? For example, i first check if anyone likes apple,banana and pear (inclusive of all three) and than go with the permutation of only two combination (apple,banana)(apple,pear)(banana,pear)(banana,apple) ....and so on.....

Imagine it like a Venn diagram where the interaction is what i am interested. I am interested to suggest users with the most matched. Is there any algorithm available that i can use for php?

Cœur
  • 37,241
  • 25
  • 195
  • 267
madi
  • 5,612
  • 5
  • 36
  • 48
  • 1
    Look into the "Jaccard coefficient", with which you can measure the similarity of two given things. – deceze Oct 11 '11 at 09:13
  • i am reading it but have no idea how could that work with a list of data ...hmmmmm – madi Oct 11 '11 at 09:28

3 Answers3

4

what you could do is join the fav_foods table to itself and then count the matches:

SELECT u2.user_name,count(*) as likeness 
FROM `fav_food` as u1 INNER JOIN `fav_food` as u2 ON (u1.user_name = 'alice' AND 
                                                 u1.food = u2.food AND 
                                                 u2.user_name != "alice")
GROUP BY u2.user_name 
ORDER BY likeness DESC

it will output:

user_name   likeness
jason       2
bob         1

the trick is on the conditions of the inner join... =)

Hope this helps

EDITED: oops i corrected the query =)

pleasedontbelong
  • 19,542
  • 12
  • 53
  • 77
1

You could take a look into Matching within Bipartite Graphs, but I doubt if this is the most efficient algorithm to use.

pderaaij
  • 1,337
  • 12
  • 32
1

Personally I would make a users HABTM food relation instead of storing the same strings over and over.

Second, look here for a good explanation of Jaccard coefficient

If you decide to go with a HABTM relation, take a look at this documentation: http://book.cakephp.org/view/1044/hasAndBelongsToMany-HABTM The last example on that page may be most interesting to you.

You can use on the fly binds to get all users that have, for example, apple and banana as favorite food.

Or you can just get all users and their food and compare them using the Jaccard coefficient.

Just throwing some possibilities in, hope it helps!

Community
  • 1
  • 1
Kevin Vandenborne
  • 1,397
  • 1
  • 10
  • 28