-1

There are two id descriptions which are similar that is have say 80% similarity. I need both to be given same id.

There are other id descriptions having say 60% similarity. These should retain their own ids. Once an id desc has been considered and modified, it should not be taken as a reference. Further example:

id       id description
1        pepsodent
2        pepsodent salt
3        pepsod
4        pepsodent and salt
5        peps

Now, pepsodent matches with pepsodent salt.therefor both should be given id as 1 Now as pepsodent salt has already been modified,it cannot be used as a scale of reference further.

Community
  • 1
  • 1
  • 1
    If you can define what 80% similarity and/or 60% similarity actually mean, then we can help. You need to define precisely what the rules are that determine which descriptions should be matched with one another – kjmerf Aug 01 '16 at 18:27
  • Can you show your existing SQL or VBA code that you have tried for this? That might help guide people to give answers that are more suitable for you. – halfer Aug 01 '16 at 19:04
  • 1
    You might try implementing this algorithm and see if it meets your needs for determining "similarity" http://stackoverflow.com/questions/4243036/levenshtein-distance-in-excel – Tim Williams Aug 01 '16 at 19:59

1 Answers1

0

As I said in my comment above, you need to define exactly what the rules are for matching two records. In this example, I am giving the a New ID to any records that contain the entire string 'pepsodent'. The New ID for these records will be 999 but you can modify as you see fit:

SELECT ID, ID_Description, 
CASE
WHEN ID_Description LIKE 'Pepsodent%' THEN 999
ELSE ID
END AS New_ID
FROM Table
kjmerf
  • 4,275
  • 3
  • 21
  • 29
  • Rules for matching are the percentage similarity between any 2 descriptions. If %tage similarity is say greater than 80% only then shall the id be modified else not.We start reading the respective characters from left to right.Say if the description is for 100 characters,only if 80 match shall it be changed. Now the product name may be in between the statement too.i.e. not required thet description should star with 'pepsodent' as in example above. Even the white spaces are to be considered('_ 'or', 'or'!' and so on) – kaushik Aug 02 '16 at 05:50
  • So which ID should the string 'peds' be matched with? All of those four characters are contained within each of your examples – kjmerf Aug 02 '16 at 14:12
  • pepsodent comes first in the table.Thus it is matched to rest of the table.Now assume'pepsodent salt' has matched.It is therefore given id as 1.This id obtained is not compared further i.e. it is fixed.Pespsodent is then compared to rest of the table. Peps shall be compared to pepsodent. To have a match we assume more than 80% should be similar. – kaushik Aug 02 '16 at 16:18