I am creating a function for the purpose of address comparison in SQL. Right now I have a working function but it only works on one address comparison record at a time. Meaning, I feed the function parameters like Street1_Old, and Street1_New, City_Old, City_New, ... and then the output is a table with one record, and each column represents whether or not the Street1, Street2, City, State, Zip are a match. The function performs normalization of street suffix abbreviations, remove punctuations, and essentially tries to standardize the address strings as much as possible before comparing them.
This is the background explanation of my function. Essentially, I would like the output of the function to have many records, each representing the match columns for every address comparison I feed the function. So I am wondering if it is possible to feed the function a table instead of single parameter values. The function is very slow right now and I thought this might be a way to speed it up. All my code is attached below. I would appreciate if anyone has any thoughts on how to speed this function up / if it is possible to feed the function a table of values. I hope the code attached below will help this make more sense.
Current way the function is being used:
update bkas
set Address_Match_Flag = 1
from #bkas bkas
cross apply dbo.AddressCompare(BKA_Street1, BKA_Street2, BKA_City, BKA_State, BKA_Zip,
BKA_Street1_Current, BKA_Street2_Current, BKA_City_Current, BKA_State_Current, BKA_Zip_Current,
BKA_Id) ac
where ac.Join_Id = BKA_Id
and ac.Total_Match = 1
And to demonstrate, the output of the AddressCompare() function would look something like:
Street1_Match | Street2_Match | City_Match | State_Match | Zip Match | Total_Match | Join_Id |
---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 23 |
So if there are 100,000 records in #bkas that I want to perform address comparison on, I will need to call this function 100,000 times as a cross apply, each time feeding it the data from each record in #bkas. At least that is how I am understanding it as of now. However I think things would be much faster if I were able to feed the function the entire table, #bkas, and then have the output be a table as shown above, but with 100,000 records in it. That way the function is only called once. I would like to be able to feed the function any table as long as it meets the desired parameter requirements, i.e. having 5 columns which represent one address (street1, street2, city, state, zip) and 5 other columns representing the other address we wish to compare, and then some id column.
Possible new way to use the function if it were able to accept a table, and also the names of the columns we wish to use for the address data, or something like that:
-- First gather all data from the function into a cursor
select *
into #address_match_data
from dbo.AddressCompare(#bkas, all 11 columns which we wish to use as the names like shown in the above example)
So then the cursor #address_match_data would have the same amount of records as in #bkas, something like:
Street1_Match | Street2_Match | City_Match | State_Match | Zip Match | Total_Match | Join_Id |
---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 | 1 |
0 | 1 | 1 | 1 | 1 | 0 | 2 |
... | ... | ... | ... | ... | ... | ... |
1 | 1 | 1 | 1 | 1 | 1 | 100000 |
Attached below is the code structure I have for the function, AddressCompare():
create function [dbo].[AddressCompare]
(
@street1_t1 varchar(64),
@street2_t1 varchar(64),
@city_t1 varchar(64),
@state_t1 varchar(32),
@zip_t1 varchar(16),
@street1_t2 varchar(64),
@street2_t2 varchar(64),
@city_t2 varchar(64),
@state_t2 varchar(32),
@zip_t2 varchar(16),
@join_id int
)
returns @matchtable table
(
Street1_Match bit, Street2_Match bit, City_Match bit, State_Match bit, Zip_Match bit,
Total_Match bit, Join_Id int
)
as
begin
-------------------------------------
-- Remove any unnecessary punctuation
set @street1_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@street1_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @street2_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@street2_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @city_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@city_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @state_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@state_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @zip_t1 = trim(upper(replace(replace(replace(replace(replace(replace(@zip_t1, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @street1_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@street1_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @street2_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@street2_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @city_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@city_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @state_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@state_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
set @zip_t2 = trim(upper(replace(replace(replace(replace(replace(replace(@zip_t2, '.', ''), '#', ''), '-', ' '), ',', ''), '''', ''), ' ', ' ') ))
---------------------------------------------------------------
-- Shrink down spaces so that there is not more than 1 anywhere
while len(@street1_t1) - len(replace(@street1_t1, ' ', ' ')) >= 1 -- this loop will keep running until replacing double space with space does not change the length, i.e. no more double space.
begin
set @street1_t1 = replace(@street1_t1, ' ', ' ')
end
while len(@street2_t1) - len(replace(@street2_t1, ' ', ' ')) >= 1
begin
set @street2_t1 = replace(@street2_t1, ' ', ' ')
end
while len(@city_t1) - len(replace(@city_t1, ' ', ' ')) >= 1
begin
set @city_t1 = replace(@city_t1, ' ', ' ')
end
while len(@street1_t2) - len(replace(@street1_t2, ' ', ' ')) >= 1
begin
set @street1_t2 = replace(@street1_t2, ' ', ' ')
end
while len(@street2_t2) - len(replace(@street2_t2, ' ', ' ')) >= 1
begin
set @street2_t2 = replace(@street2_t2, ' ', ' ')
end
while len(@city_t2) - len(replace(@city_t2, ' ', ' ')) >= 1
begin
set @city_t2 = replace(@city_t2, ' ', ' ')
end
-------------------------------------------------------------
-- Expand all street suffixes using data in Codes table (RD -> ROAD, ST -> STREET, FIRST -> 1ST, ...)
declare @suffix_table table (id int, ctw_id int, cto_id int, ctw_shortdescr varchar(64), ctw_description varchar(64))
insert into @suffix_table
select row_number() over (order by ctw_id) as id,
ctw.ctw_id,
ctw.cto_id,
ctw.ctw_shortdescr,
ctw.ctw_description
from Codes ctw
where ctw.CTW_Type = 'Street Suffix'
and upper(ctw.CTW_Reference) = upper('AddressCompare')
declare @suffixcnt int = (select count(*) from @suffix_table)
declare @counter int = 1
declare @abbrev varchar(32)
declare @long varchar(64)
while @counter <= @suffixcnt
begin
set @abbrev = (select ctw_shortdescr from @suffix_table where id = @counter)
set @long = (select ctw_description from @suffix_table where id = @counter)
set @street1_t1 = trim(replace(' ' + trim(@street1_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
if @street2_t1 != ''
begin
set @street2_t1 = trim(replace(' ' + trim(@street2_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
end
set @city_t1 = trim(replace(' ' + trim(@city_t1) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
-------------------
set @street1_t2 = trim(replace(' ' + trim(@street1_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
if @street2_t2 != ''
begin
set @street2_t2 = trim(replace(' ' + trim(@street2_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
end
set @city_t2 = trim(replace(' ' + trim(@city_t2) + ' ', upper(' ' + @abbrev + ' '), upper(' ' + @long + ' ')))
set @counter = @counter + 1
end
-----------------------------------------------------------------------
-------------------
-- Matching Process
-------------------
insert into @matchtable
select cast(iif(@street1_t1 = @street1_t2, 1, 0) as bit) as Street1_Match,
cast(iif(@street2_t1 = @street2_t2, 1, 0) as bit) as Street2_Match,
cast(iif(@city_t1 = @city_t2, 1, 0) as bit) as City_Match,
cast(iif(@state_t1 = @state_t2, 1, 0) as bit) as State_Match,
cast(iif(@zip_t1 = @zip_t2, 1, 0) as bit) as Zip_Match,
cast(iif(@street1_t1 = @street1_t2 and @street2_t1 = @street2_t2 and @city_t1 = @city_t2 and @state_t1 = @state_t2 and @zip_t1 = @zip_t2, 1, 0) as bit) as Total_Match,
cast(@join_id as int) as Join_Id
return
end