2

I am working on a ticketing system where user can buy tickets for events. For these events info we are relying on different suppliers feed. Here the problem is each supplier would call each event in a different way but there will be some similarity.

So before inserting any event info into a table I need to find out are there any events with similar shows. If there are any similar events, we have some more filter mechanism where we would be checking the venue, dates and etc.

To find out the similar events at first step I am doing the following.

Splitting the new event name into string array (using cursor). Looping through every item and finding the events with similar name (using like). And inserting all those records into a temp table.

DECLARE @TmpTable TABLE (ProductHeaderID int, Name varchar(200))

DECLARE @TmpAddress TABLE (AddressId int)

DECLARE @Item Varchar(100)
DECLARE Items CURSOR LOCAL FOR select Item from dbo.SplitString(@ProductHeader,' ')

OPEN Items
FETCH NEXT FROM Items into @Item
WHILE @@FETCH_STATUS = 0
BEGIN

    If dbo.Trim(@Item) <> '-' and dbo.Trim(@Item) <> ' ' and LEN(dbo.Trim(@Item)) > 3
    Begin
        --Print @Item
        Insert into @TmpTable (ProductHeaderID,Name)
            Select ProductHeaderID,Name from Product.ProductHeaderRepository 
            Where Name LIKE '%[^a-z]'+dbo.trim(@Item)+'[^a-z]%'
    End
    FETCH NEXT FROM Items into @Item
END

CLOSE Items
DEALLOCATE Items

SplitString is an user defined function which returns a table.

ALTER function [dbo].[SplitString] 
(
    @str nvarchar(max), 
    @separator char(1)
)
returns table
AS
return (
with tokens(p, a, b) AS (
    select cast(1 as bigint),  cast(1 as bigint), charindex(@separator, @str)
    union all
    select p + 1,b + 1, charindex(@separator, @str, b + 1) from tokens where b > 0
)
select p-1 ItemIndex,substring(@str, a, case when b > 0 then b-a ELSE LEN(@str) end) AS Item from tokens
);

And the Trim function is below

ALTER FUNCTION [dbo].[TRIM](@string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
BEGIN
RETURN LTRIM(RTRIM(@string))
END

But performance wise above mentioned logic taking lot of time. Just for 4k records it is taking more than 10 minutes. Could somebody help me in optimizing the above.

fthiella
  • 48,073
  • 15
  • 90
  • 106
Naresh
  • 2,667
  • 13
  • 44
  • 69
  • Can you add an example of the parameters you pass? And, why do you split the split the productheader? what is `dbo.trim`? – shahkalpesh Jun 29 '13 at 15:00
  • @shahkalpesh - Productheader is event name. I need to find the events with similar names. Some provider may say event with name 'Jaki Graham' and some body may say 'GRAHAM BONNETT'. So I need to find all the events with similar names. – Naresh Jun 29 '13 at 15:06
  • Are you trying to find event with similar words in it across the time? Wouldn't it be meaningful to search for event by time? I guess, that is what you are looking at. – shahkalpesh Jun 29 '13 at 15:08
  • We are doing that but that will come at the later stage. In the first step we are finding the events with similar names and there after we are filtering by event names. – Naresh Jun 29 '13 at 15:09
  • I dont have any specific tips.However - are you trying to find all events that matches any of the word? Also, wouldn't it better to have the string splitt-ed into a table, such that each word will become a row in a temporary table - whereby it can be joined with and `LIKE` can be used on it? Another suggestion is to reduce the calls such as `dbo.Trim` and store the results into a variable. Time things - as to how long does it take for each important step and which step is taking long. Lastly, it will be better to have the search scope limited to events of the day, than events of the past. – shahkalpesh Jun 29 '13 at 15:13
  • Maybe you might want to change your whole approach to using fuzzy string comparison algorithms like SOUNDEX instead of cutting strings into pieces and comparing these pieces. See for instance [this question](http://stackoverflow.com/questions/921978/fuzzy-matching-using-t-sql). – Andreas Jun 29 '13 at 15:21
  • Try running these steps in SQL, as if you are writing the above code without a stored procedure (with the example parameters that took 10 minutes) and see what step is taking long. – shahkalpesh Jun 29 '13 at 15:29
  • @shahkalpesh thanks for your inputs. I am calling the above function from another sql query which returns around 3500 records. For the whole 3500 records this function is taking nearly more than 10 minutes. – Naresh Jun 29 '13 at 15:43
  • @Naresh: Post your findings and steps you took that lead to reduction in time, for benefit of all. – shahkalpesh Jun 29 '13 at 15:58

1 Answers1

0

If I am not mistaken, you can replace the while loop with the following:

Insert into @TmpTable (ProductHeaderID,Name)
Select ProductHeaderID,Name from Product.ProductHeaderRepository 
join dbo.SplitString(@ProductHeader,' ') t on Name LIKE '%[^a-z]'+dbo.trim(t.Item)+'[^a-z]%'

this should improve performance.

Alexey
  • 909
  • 6
  • 11