0

I have a somewhat unusual need to find/replace values in a string from values in a separate table.

Basically, I need to standardize a bunch of addresses, and one of the steps is to replace things like St, Rd or Blvd with Street, Road or Boulevard. I was going to write a function with bunch of nested REPLACE() statements, but this is 1) inefficient; and 2) not practical. There are over 500 possible abbreviations for street types according the USPS website.

What I'd like to do is something akin to:

REPLACE(Address1, Col1, Col2) where col1 and col2 are abbreviation and full street type in a separate table.

Anyone have any insight into something like this?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • SQL Server might not be the best place for this, as it does not have any regex replacement support. – Tim Biegeleisen Mar 22 '18 at 00:31
  • build an abbreviation dictionary and replace your data – RoMEoMusTDiE Mar 22 '18 at 00:43
  • How will you handle separating words, e.g. replacing all occurrences of "Rd" using `Replace()` will do fun things with "Bigbird Street"? – HABO Mar 22 '18 at 00:49
  • I agree with @HABO and others this could produce very strange results, and SQL isn't really the right place to do this. – Kevin Mar 22 '18 at 01:02
  • T-SQL is unfortunately the only solution in this scenario. I'm migrating data for a customer from a legacy database into a new (different) database, and need to suppress an records that already exist based upon an address match. Standardizing the address is part of the process. – Jeff Jacobs Mar 22 '18 at 01:18
  • 1
    Having been down this road before, you may want to take a peek at https://stackoverflow.com/questions/41249742/address-standardization-within-a-database/41249971#41249971 – John Cappelletti Mar 22 '18 at 02:43

1 Answers1

0

You can do such replacements using a recursive CTE. Something like this:

with r as (
      select t.*, row_number() over (order by id) as seqnum
      from replacements
     ),
     cte as (
      select replace(t.address, r.col1, r.col2) as address, seqnum as i
      from t cross join
           r
      where r.seqnum = 1
      union all
      select replace(cte.address, r.col1, r.col2), i + 1
      from cte join
           r
           on r.i = cte.i + 1
     )
select cte.*
from (select cte.*, max(i) over () as maxi
      from cte
     ) cte
where maxi = i;

That said, this is basically iteration. It will be quite expensive to do this on a table where there are 500 replacements per row.

SQL is probably not the best tool for this.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786