I am currently trying to figure out how to calculate the similarity between two records. My first record would be from a deactivated advertisement - so I want to find e.g. the 10 most similar advertisement regarding to some VARCHAR-fields equalness.
The thing, I can't figure out is, if there is any MySQL function, that can help me in any way - or if I need to compare the strings in some weird way?
EDIT #1
Similarity would be defined by these fields:
Title (weight: 50 %)
Content (weight: 40 %)
Category (weight: 10 %)
EDIT #2
I want the calculation to be like this:
Title: Words that match in the title field (only words >2 letters are matched).
Description: Words that match in the title field (only words >2 letters are matched).
Catgory: Match the category and if that doesn't match match the parent category with less weight :)
An equation of this could be:
#1 is the old, inactive post, #2 is the active post:
#2 title matches #1 title in 3 words out of #2's total of 10 words. That gives 30 % match = 30 points.
#2 description matches #1 description in 10 words out of #2's total of 400 words. That gives a 4 % match = 4 points.
#2 category doesn't match #1's category, therefore 0 % match. That gives 0 points.
Then the sum would be 34 points for #2. :)
Edit #3
Here's my query - but it doesn't return different rows, but a lot of the same row.
SELECT
a.AdvertisementID as A_AdvertisementID,
IF(a.Topic LIKE a2.Topic, 50, 0) + IF(a.Description LIKE a2.Description, 40, 0) + IF(a.Cate_CategoryID LIKE a2.Cate_CategoryID, 10, 0) as A_Score,
a.AdvertisementID as A_AdvertisementID,
a.Topic as A_Topic,
LEFT(a.Description, 300) as A_Description,
a.Price as A_Price,
a.Type as A_Type
FROM
".DB_PREFIX."A_Advertisements a2,
".DB_PREFIX."A_Advertisements a
WHERE
a2.AdvertisementID <> a.AdvertisementID
AND
a.AdvertisementID = :a_id
ORDER BY
A_Score DESC