-1

I am doing a piece of work that involves standardizing address formats for my company and I was wondering if it was possible to isolate the housing/apartment number of a dwelling in a string in SQL Server. The data quality is currently poor and therefore address standardization is not good.

Some examples below; with desired outcome

Dwelling Address Outcome Desired
Flat 123 Fake Street 123
48 Bond Street 48
Apartment 234 Lemon Lane 234

Any code examples would be greatly appreciated.

Many thanks.

pdwebb23
  • 13
  • 2
  • This should help https://stackoverflow.com/questions/16667251/query-to-get-only-numbers-from-a-string – Edward Radcliffe Apr 08 '22 at 16:17
  • What happens if you have "Flat 1 12 Long Street"? – Thom A Apr 08 '22 at 16:31
  • There's no magic in any SQL language that will be able to parse an address reliably 100% of the time. In fact there are multiple software packages designed to do exactly this and, even at the high end at thousands of dollars per seat, still mess up an alarmingly high percentage of the time. – Aaron Bertrand Apr 08 '22 at 16:33

3 Answers3

4

It appears from your sample data you just want to extract the numeric digits.

The most performant way I've found to do this in SQL Server is a combination of translate and replace.

with d as (
  select * from
  (values ('Flat 123 Fake Street'),('48 Bond Street'),('Apartment 234 Lemon Lane'))v(Address)
)
select *, 
 Outcome = Replace(Translate(Address,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',Replicate(' ',26)), ' ','')
from d;

Add any other required punctuation to the string and increment 26 accordingly.

In production it can be implemented as a table-valued function.

Stu
  • 30,392
  • 6
  • 14
  • 33
  • 1
    Nice use of translate.... Still not in the habit of considering it. +1 – John Cappelletti Apr 08 '22 at 18:36
  • 1
    Recently replaced a scalar function that was looping over a string character by character to extract only the digits (think phone number free text entry that allows names or notes to be included) - using this method was 10x faster on average, very useful when used against a table of 5 million customers! – Stu Apr 08 '22 at 18:45
2

Parsing Addresses can be risky, you may want to consider Address standardization within a database

That said, with a bit of string manipulation and the use of CROSS APPLY (or two)

Example

Declare @YourTable Table ([Dwelling Address] varchar(50))  Insert Into @YourTable Values 
 ('Flat 123 Fake Street')
,('48 Bond Street')
,('Apartment 234 Lemon Lane')
,('Flat 1 12 Long Street')
,('No Number')
 
Select A.*
      ,NewValue = substring(S1,1,patindex('%[a-z]%',S1+'a')-1)
 From @YourTable A
 Cross Apply ( values (patindex('%[0-9]%',[Dwelling Address])) )B(P1)
 Cross Apply ( values (stuff([Dwelling Address],1,P1-1,'')   ) )C(S1)

Results

Dwelling Address            NewValue
Flat 123 Fake Street        123 
48 Bond Street              48 
Apartment 234 Lemon Lane    234 
Flat 1 12 Long Street       1 12 
No Number                   NULL
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
0

your data

DECLARE @a TABLE
  (
     dwelling_address VARCHAR(max) NOT NULL
  );

INSERT INTO @a
            (dwelling_address)
VALUES      ('Flat 123 Fake Street'),
            ('48 Bond Street'),
            ('Apartment 234 Lemon Lane');  

query to get your desired result

SELECT dwelling_address,
       LEFT(Substring(dwelling_address, Patindex('%[0-9.-]%', dwelling_address),
            8000),
       Patindex('%[^0-9.-]%', Substring(dwelling_address, Patindex('%[0-9.-]%',
                              dwelling_address
       ), 8000)
                              + 'X') - 1) as Outcome_Desired
FROM   @a  
RF1991
  • 2,037
  • 4
  • 8
  • 17