Take a look at my answer about using LIKE
operator here
It could be quite performant if you use some tricks
You can gain much speed if you play with collation, try this:
SELECT DISTINCT TOP 10 p.[Id], n.[LastName], n.[FirstName]
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
WHERE EXISTS (
SELECT 'x' x
FROM [dbo].[people_NAME] n2
WHERE n2.[Id] != p.[id]
AND
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
)
As you can see we are using binary comparision instead of string comparision and this is much more performant.
Pay attention, you are working with people's names, so you can have issues with special unicode characters or strange accents.. etc.. etc..
Normally the EXISTS
clause is better than INNER JOIN
but you are using also a DISTINCT
that is a GROUP BY
on all columns.. so why not to use this?
You can switch to INNER JOIN
and use the GROUP BY
instead of the DISTINCT
so testing COUNT(*)>1
will be (very little) more performant than testing WHERE n2.[Id] != p.[id]
, especially if your TOP clause is extracting many rows.
Try this:
SELECT TOP 10 p.[Id], n.[LastName], n.[FirstName]
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
INNER JOIN [dbo].[people_NAME] n2 on
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
GROUP BY n1.[Id], n1.[FirstName]
HAVING COUNT(*)>1
Here we are matching also the name itself, so we will find at least one match for each name.
But We need only names that matches other names, so we will keep only rows with match count greater than one (count(*)=1 means that name match only with itself).
EDIT: I did all test using a random names table with 100000 rows and found that in this scenario, normal usage of LIKE operator is about three times worse than binary comparision.