0

I am creating a function for the purpose of address comparison in SQL. Right now I have a working function but it only works on one address comparison record at a time. Meaning, I feed the function parameters like Street1_Old, and Street1_New, City_Old, City_New, ... and then the output is a table with one record, and each column represents whether or not the Street1, Street2, City, State, Zip are a match. The function performs normalization of street suffix abbreviations, remove punctuations, and essentially tries to standardize the address strings as much as possible before comparing them.

This is the background explanation of my function. Essentially, I would like the output of the function to have many records, each representing the match columns for every address comparison I feed the function. So I am wondering if it is possible to feed the function a table instead of single parameter values. The function is very slow right now and I thought this might be a way to speed it up. All my code is attached below. I would appreciate if anyone has any thoughts on how to speed this function up / if it is possible to feed the function a table of values. I hope the code attached below will help this make more sense.

Current way the function is being used:

update bkas
set Address_Match_Flag = 1
from #bkas bkas
cross apply dbo.AddressCompare(BKA_Street1, BKA_Street2, BKA_City, BKA_State, BKA_Zip, 
                               BKA_Street1_Current, BKA_Street2_Current, BKA_City_Current, BKA_State_Current, BKA_Zip_Current,
                                     BKA_Id) ac     
where ac.Join_Id = BKA_Id
       and ac.Total_Match = 1

And to demonstrate, the output of the AddressCompare() function would look something like:

Street1_Match Street2_Match City_Match State_Match Zip Match Total_Match Join_Id
1 1 1 1 1 1 23

So if there are 100,000 records in #bkas that I want to perform address comparison on, I will need to call this function 100,000 times as a cross apply, each time feeding it the data from each record in #bkas. At least that is how I am understanding it as of now. However I think things would be much faster if I were able to feed the function the entire table, #bkas, and then have the output be a table as shown above, but with 100,000 records in it. That way the function is only called once. I would like to be able to feed the function any table as long as it meets the desired parameter requirements, i.e. having 5 columns which represent one address (street1, street2, city, state, zip) and 5 other columns representing the other address we wish to compare, and then some id column.

Possible new way to use the function if it were able to accept a table, and also the names of the columns we wish to use for the address data, or something like that:

   -- First gather all data from the function into a cursor
   select * 
       into #address_match_data
       from dbo.AddressCompare(#bkas, all 11 columns which we wish to use as the names like shown in the above example)

So then the cursor #address_match_data would have the same amount of records as in #bkas, something like:

Street1_Match Street2_Match City_Match State_Match Zip Match Total_Match Join_Id
1 1 1 1 1 1 1
0 1 1 1 1 0 2
... ... ... ... ... ... ...
1 1 1 1 1 1 100000

Attached below is the code structure I have for the function, AddressCompare():

create function [dbo].[AddressCompare]
(
@street1_t1 varchar(64),
@street2_t1 varchar(64),
@city_t1    varchar(64),
@state_t1   varchar(32),
@zip_t1     varchar(16),
@street1_t2 varchar(64),
@street2_t2 varchar(64),
@city_t2    varchar(64),
@state_t2   varchar(32),
@zip_t2     varchar(16),
@join_id  int

)
returns @matchtable table
(
Street1_Match bit, Street2_Match bit, City_Match bit, State_Match bit, Zip_Match bit, 
Total_Match bit,  Join_Id int
)
as
begin

-------------------------------------
-- Remove any unnecessary punctuation
set @street1_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@street1_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @street2_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@street2_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @city_t1    = trim(upper(replace(replace(replace(replace(replace(replace(@city_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @state_t1   = trim(upper(replace(replace(replace(replace(replace(replace(@state_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @zip_t1     = trim(upper(replace(replace(replace(replace(replace(replace(@zip_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))

set @street1_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@street1_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @street2_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@street2_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @city_t2    = trim(upper(replace(replace(replace(replace(replace(replace(@city_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @state_t2   = trim(upper(replace(replace(replace(replace(replace(replace(@state_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))
set @zip_t2     = trim(upper(replace(replace(replace(replace(replace(replace(@zip_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), '  ', ' ') ))

---------------------------------------------------------------
-- Shrink down spaces so that there is not more than 1 anywhere

while len(@street1_t1) - len(replace(@street1_t1, '  ', ' ')) >= 1  -- this loop will keep running until replacing double space with space does not change the length, i.e. no more double space.
begin
    set @street1_t1 = replace(@street1_t1, '  ', ' ')
end

while len(@street2_t1) - len(replace(@street2_t1, '  ', ' ')) >= 1  
begin
    set @street2_t1 = replace(@street2_t1, '  ', ' ')
end

while len(@city_t1) - len(replace(@city_t1, '  ', ' ')) >= 1    
begin
    set @city_t1 = replace(@city_t1, '  ', ' ')
end

while len(@street1_t2) - len(replace(@street1_t2, '  ', ' ')) >= 1  
begin
    set @street1_t2 = replace(@street1_t2, '  ', ' ')
end

while len(@street2_t2) - len(replace(@street2_t2, '  ', ' ')) >= 1  
begin
    set @street2_t2 = replace(@street2_t2, '  ', ' ')
end

while len(@city_t2) - len(replace(@city_t2, '  ', ' ')) >= 1
begin
    set @city_t2 = replace(@city_t2, '  ', ' ')
end

-------------------------------------------------------------
-- Expand all street suffixes using data in Codes table (RD -> ROAD, ST -> STREET, FIRST -> 1ST, ...)

declare @suffix_table table (id int, ctw_id int, cto_id int, ctw_shortdescr varchar(64), ctw_description varchar(64))
insert into @suffix_table
    select  row_number() over (order by ctw_id) as id,
            ctw.ctw_id,
            ctw.cto_id,
            ctw.ctw_shortdescr,
            ctw.ctw_description
        from Codes ctw
        where ctw.CTW_Type = 'Street Suffix'
            and upper(ctw.CTW_Reference) = upper('AddressCompare')


declare @suffixcnt int = (select count(*) from @suffix_table)
declare @counter int = 1

declare @abbrev varchar(32)
declare @long varchar(64)

while @counter <= @suffixcnt
begin
    set @abbrev = (select ctw_shortdescr from @suffix_table where id = @counter)
    set @long = (select ctw_description from @suffix_table where id = @counter)     

    set @street1_t1 = trim(replace(' ' + trim(@street1_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
    
    if @street2_t1 != ''
    begin
        set @street2_t1 = trim(replace(' ' + trim(@street2_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
    end

    set @city_t1 = trim(replace(' ' + trim(@city_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))

    -------------------

    set @street1_t2 = trim(replace(' ' + trim(@street1_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))

    if @street2_t2 != ''
    begin
        set @street2_t2 = trim(replace(' ' + trim(@street2_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
    end

    set @city_t2 = trim(replace(' ' + trim(@city_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))

    set @counter = @counter + 1
    
end

-----------------------------------------------------------------------
-------------------
-- Matching Process
-------------------

insert into @matchtable
    select  cast(iif(@street1_t1 = @street1_t2, 1, 0) as bit) as Street1_Match,
            cast(iif(@street2_t1 = @street2_t2, 1, 0) as bit) as Street2_Match,
            cast(iif(@city_t1 = @city_t2, 1, 0) as bit) as City_Match,
            cast(iif(@state_t1 = @state_t2, 1, 0) as bit) as State_Match,
            cast(iif(@zip_t1 = @zip_t2, 1, 0) as bit) as Zip_Match,
            cast(iif(@street1_t1 = @street1_t2 and @street2_t1 = @street2_t2 and @city_t1 = @city_t2 and @state_t1 = @state_t2 and @zip_t1 = @zip_t2, 1, 0) as bit) as Total_Match,
            cast(@join_id as int) as Join_Id
       
return
end
Dale K
  • 25,246
  • 15
  • 42
  • 71
DizzleBeans
  • 173
  • 6
  • This is a slippery slope. Take a peek at https://stackoverflow.com/questions/41249742/address-standardization-within-a-database/41249971#41249971 – John Cappelletti May 25 '23 at 18:05
  • @JohnCappelletti Thanks for the response. I know address matching is slippery. For the purpose of this question I am really just wondering if the type of output i described is possible. I know the address comparison is not perfect, but the function above does work well for the data i have right now. Matching about 80% of the data that has an address mismatch. However it is very slow. So mainly looking for ways to speed it up. – DizzleBeans May 25 '23 at 18:09
  • Feeding it a table of values won't speed it up if the function is already slow... – Dale K May 25 '23 at 21:47
  • 1
    You might find it you do you replacements in a query i.e. replace all the values in one query rather than a bunch of set statements it might go faster... and aim to keep everything set based as possible. Even just using a single query to carry out your replacements would mean you could use a single loop rather than loads of loops (I recon you use a recursive CTE to remove all the loops TBH). Also consider using CLR functions i.e. functions that run native in .NET where you have a lot better string functions. – Dale K May 25 '23 at 21:48
  • @DaleK I agree. However I would have to have my parameter data in a table of some sort to perform those replacements all at once, correct? That is related to what I am trying to accomplish. A way to have the output of the function contain many records and not just one. Do you see what I mean? – DizzleBeans May 25 '23 at 21:50
  • No, a select can operate on a bunch of variables e.g. `select @Param1 = replace(@Param1,' ',''), @Param2 = replace(@Param2,' ','')` etc – Dale K May 25 '23 at 21:54
  • @DaleK in response to your first comment: although it might not speed it up, it is still possible to feed a function a table in essence? It does not seem like that is possible from what i've researched. – DizzleBeans May 25 '23 at 21:54
  • Pretty sure you can pass a [table valued parameter](https://learn.microsoft.com/en-us/sql/relational-databases/tables/use-table-valued-parameters-database-engine?view=sql-server-ver16) to a function. However I would suggest that for what you are trying to do you would be better off with a stored procedure - it will give you more performance tuning options, because there are less limitations. – Dale K May 25 '23 at 21:56
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253833/discussion-between-dizzlebeans-and-dale-k). – DizzleBeans May 25 '23 at 22:00
  • 1
    The act of calling a function in SQL Server in negligible compared to procedural complexity of the function code. To naswer your question: NO, the function will not become faster by calling it only once. In order to make this function noticeably faster you will need to change it into a iTVF function (inline table valued function). This will require a major rewrite to make it a single set based operation. You will have to consider trade offs, namely it will be practically unmaintainable (i.e. the code will be unreadable ). – Alex May 25 '23 at 23:14
  • 1
    You can get a major perf improvement by making this query ` and upper(ctw.CTW_Reference) = upper('AddressCompare')` [SARGABLE](https://dba.stackexchange.com/questions/162263/what-does-the-word-sargable-really-mean). Unless you running case sensitive collation, you do not need `upper()` function. If you do run case sensitive collation, then make sure that your Codes table has all values in the proper case. – Alex May 25 '23 at 23:20

0 Answers0