1

I have a ranking script that is working just fine - whereas it ranks the top 30 products that sale month over month, but in the last few months my data has changed a bit. So the product name went from:

    product_name 

to

   product_name@

where the @ is some sort of special character. Note that the special character is not always only one character, sometimes there is a space between the special character and the name. Additionally, the product name does not always end in a letter, occasionally it ends in a number. This oddity in the data appears to happen at random and to varied product names. I have over 50,000 products in this table. Is there a way that I can still group by product? Or perhaps trim the special character off?

windsormatic
  • 173
  • 1
  • 2
  • 8
  • 1
    You'd be best sorting out your data. If not, take a look at the SOUNDEX function http://stackoverflow.com/questions/37744220/sql-similar-data-in-column/37745913#37745913 – Rich Benner Jul 07 '16 at 14:55
  • Can any of these special characters appear in the product name itself? Do you have a full list of these characters? – PM 77-1 Jul 07 '16 at 14:56
  • Its almost always a copyright, tm or reserved symbol. Those symbols are occasionally inside the product name themselves - but have historically been there so they are not causing any problems. – windsormatic Jul 07 '16 at 15:01

1 Answers1

0

If those are trailing symbols, take a look at this solution:

SELECT RTRIM(SUBSTRING(Name, 1, CASE WHEN PATINDEX('%[$#@]%', Name)!=0 THEN PATINDEX('%[$#@]%', Name)-1 ELSE LEN(Name) END)) Word, COUNT(*) Total
FROM (VALUES
    ('Cat $#@'),
    ('Cat $'),
    ('Dog'),
    ('Dog$'),
    ('Cat'),
    ('Cat ')) T(Name)
GROUP BY RTRIM(SUBSTRING(Name, 1, CASE WHEN PATINDEX('%[$#@]%', Name)!=0 THEN PATINDEX('%[$#@]%', Name)-1 ELSE LEN(Name) END))

Words are cut by any special character in brackets [$#@] and RTrimmed to remove unnecessary spaces.

Paweł Dyl
  • 8,888
  • 1
  • 11
  • 27