1

I am using Sql Server 2014

I have two tables, and each has two columns. One has names and addresses(table name is dbo.tgt_address), another has common street abbreviations and a corresponding abbreviation I want to change the original abbreviation to (table name is dbo.street_abreviations).

enter image description here enter image description here

I need a query to change the text in the Address column of the tgt_address table, where the text contains a text from the Street_abbreviation table's commonly_used column, to what the text in Abbreviation.

Output:

enter image description here

I understand that this could work for individual rows:

select replace(Address,'123 State St Chester PA 25374', '123 State ST Chester PA 25374')

but as there are many more records in the table that what I am reporting here, how can I create a faster process? would I need to incorporate substring() and a subselect in some way? I've tried to find documentation on this but failed so I hope someone can provide some direction to at least get me started, thank you!

Rbell
  • 46
  • 1
  • 10
  • 1
    Cleansing addresses is nearly impossible to do like you are trying to do. Just look at your first example you have ST as a replacement for st. But you have two instances of the string "st" in the first address. You would just about have to split this on spaces, then evaluate each element to determine if it needs to adjusted, then finally reassemble the fragments again. – Sean Lange Jun 05 '19 at 19:48
  • 1
    This is a slippery slope. You think you've got items nailed like St for Street, but what about St Louis. DE for Delaware, but what about De Soto, GA ? Perhaps take a peek at https://stackoverflow.com/questions/41249742/address-standardization-within-a-database/41249971#41249971 – John Cappelletti Jun 05 '19 at 19:50
  • 3
    As Sean stated, this is not a task for a single individual. There are companies that make very good money doing this. Search for address standardization. If you are intent on creating your own address standardization tool, you should first think about breaking the addresses up into their parts. Site number, street name, street name prefix, street name suffix, zip, zip4, site number prefix/suffix (for 123 B Main St and such). Very complex topic to standardize addresses that have a somewhat widespread standard (their parts) but... – jamie Jun 05 '19 at 19:53
  • 1
    with over 3000 counties in the USA, you get all sort of local intricacies to addresses. As John said as well, this is a dangerously slippery slope. You make some progress with some address then more addresses start coming in (assuming success) and you starting seeing all kinds of goofiness and you are already committed to your address standardizer so you keep going... ugh! Be careful! Good luck. – jamie Jun 05 '19 at 19:55
  • Sample data is best served as [DDL](https://en.wikipedia.org/wiki/Data_definition_language) + [DML](https://en.wikipedia.org/wiki/Data_manipulation_language). Please [edit] your question to include it, your current attempt and your desired results. For more details, [read this.](https://dba.meta.stackexchange.com/questions/2976/help-me-write-this-query-in-sql) – Zohar Peled Jun 05 '19 at 20:20
  • great notes everyone thank you. – Rbell Jun 11 '19 at 14:20

1 Answers1

0

It is slippery slope with addresses, but this can help I believe:

Initial data:

DECLARE @tgt_address TABLE (ID INT IDENTITY(1,1),Name NVARCHAR(255),Address NVARCHAR(255));
INSERT INTO @tgt_address(Name,Address)VALUES
      ('John','123 State St Chester PA 25374')
     ,('Steve','798 Main Avn New York NY 21314')
     ,('Martha','981 West Ln Wilmington De 23142')
     ,('Mary','124 Main Street #2 Austin Tx 21432')
     ,('Timothy','25 Lark Thwy Maiami FL 12342')
;
DECLARE @Street_abbreviation TABLE (Commonly_used NVARCHAR(255),Abbreviation NVARCHAR(255));
INSERT INTO @Street_abbreviation(Commonly_used,Abbreviation)VALUES
     ('St','ST')
    ,('Avn','AVE')
    ,('Ln','LN')
    ,('#','APT')
    ,('Thwy','THROUGHWAY')
;

Not the best, but easy code:

SELECT t.Name,t.Address
    ,REPLACE(REPLACE(t.Address,'#','# '),' ' + a.Commonly_used + ' ',' ' + a.Abbreviation + ' ') AS [Result]
FROM @tgt_address t
INNER JOIN @Street_abbreviation a ON REPLACE(t.Address,'#','# ') LIKE '%[ ]' + a.Commonly_used + '[ ]%'
;

Smarter Code:

SELECT r.Name,r.Address,r.Result
FROM (
    SELECT t.Name,t.Address
        ,REPLACE(REPLACE(t.Address,'#','# '),' ' + a.Commonly_used + ' ',' ' + a.Abbreviation + ' ') AS [Result]
        ,ROW_NUMBER()OVER(PARTITION BY t.Name,t.Address 
                        ORDER BY CASE WHEN REPLACE(t.Address,'#','# ') LIKE '%[ ]' + a.Commonly_used + '[ ]%' THEN 0 ELSE 1 END) AS [rn]
    FROM @tgt_address t
    CROSS JOIN @Street_abbreviation a
) r
WHERE r.rn = 1
;

Apartments are tricky as they have no space around #

Vitaly Borisov
  • 1,133
  • 2
  • 13
  • 20