1

I'm attempting to write a query which will find similar addresses in a database.

For example, users may type in a variety of addresses which may be incorrect. A similar address should be shown as the result.

For example, users might search for -

"177 pleasant street" 

"177 pleasant st"

"177 pleasant road"

How can I implement this?

So far I have tried -

SELECT * FROM Table WHERE Address LIKE '%177 pleasant st%' AND ...etc...
AdamMc331
  • 16,492
  • 10
  • 71
  • 133
fc123
  • 898
  • 3
  • 16
  • 40
  • 1
    I'm going on a limb here, but is the problem that you used 'edmonton' and not 'pleasant'? – AdamMc331 Oct 28 '14 at 16:14
  • 5
    This is a difficult problem that is not easily solved using SQL. – Abe Miessler Oct 28 '14 at 16:16
  • I see a problem again. Not a syntax error, but an error in your logic. You included 'st' in your wild card. If I put 'road', this will not be returned. This will likely only return 'st' and 'street' – AdamMc331 Oct 28 '14 at 16:17
  • 2
    I recommend leaving it as `like '%177 pleasant%'` – AdamMc331 Oct 28 '14 at 16:17
  • 1
    It seems like the OP is looking for a general solution, not limited to the specific case of `177 pleasant` – Eilidh Oct 28 '14 at 16:18
  • 1
    @McAdam331 r u suggesting I should create a stop word list in application level – fc123 Oct 28 '14 at 16:18
  • @Eilidh Sure, but in this specific example these could cause problems. By putting st at the end, this would fail for any street, not just pleasant. – AdamMc331 Oct 28 '14 at 16:19
  • 1
    @xyz if you are able (somewhere) to pull the street number and street name, I would use that as a wildcard. It may not be foolproof, but it seems intuitive enough to me. Your concern is that the user knows that information, even if they don't know to use street, ave, road, etc. – AdamMc331 Oct 28 '14 at 16:20
  • 1
    @McAdam331 True :) I was merely pointing out that a specific solution won't solve the general problem he seems to be asking about. – Eilidh Oct 28 '14 at 16:20
  • 1
    @Eilidh agreed, but that's why I took to the comments to make sure that _wasn't_ the problem, and that something else is going on. – AdamMc331 Oct 28 '14 at 16:21

5 Answers5

1

I would consider using full text search for this instead.

The big benefit is that it would help with spellings (pleasant, plesant, pleasont etc.). Of course, it would match similarity primarily on linguistical grounds, not geographical ones.

Stephen
  • 1,737
  • 2
  • 26
  • 37
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • 1
    The additional advantage to using full-text search for this is that you can define your own synonyms. In other words, you can force it to consider "road" and "street" as the same word if it fits your needs. – Allan Oct 28 '14 at 17:06
0

That's not an easy task.

You should extract the entities of the search string and the stored string and compare the same kind of entities.

In your example:

street_number: the numeric part of string.

street_name: the non-numeric part of string excluding words in a dictionary {street, st, road}

IF street_number AND street_name MATCH you can return that address.

Extract entities for general cases is dificult and you can use tools to help you.

Check: https://developers.google.com/maps/documentation/javascript/geocoding

pedromrnd
  • 153
  • 5
0

SOUNDEX function in SQL Server can get your basic matching done.

SELECT * FROM [Table] WHERE SOUNDEX([Address]) = SOUNDEX('177 pleasant st')

If you want to further customize your search by 'percent of keyword matches', you can go for implementing Jaro Winkler algorithm in a stored procedure which does the matching for you. You can start reading about Jaro Winkler here. Otherwise as suggested previously, you might need to go for matching tools.

Also take a look here

Community
  • 1
  • 1
SouravA
  • 5,147
  • 2
  • 24
  • 49
0

It depends on what you want to achieve. By the looks of it whatever the request the most important information is “177” and “pleasant” and “road” (information from your original database). So, if there are only three search parameters then you can use simple query like one below. If you want to use more then you will have to use full text search.

    DECLARE @DataFromUserInterFace varchar(300)
SET @DataFromUserInterFace='177 pleasant street'

SELECT 
[Address] 
FROM
[Employer]
WHERE 
CHARINDEX
(
 substring([Address], 1, CHARINDEX(' ', [Address], 1)-1)
 , @DataFromUserInterFace 
 , 1)!=0
OR 
CHARINDEX
(
substring
([Address]
, CHARINDEX(' ', [Address], 1)+1
, CHARINDEX(' ', [Address], CHARINDEX(' ', [Address], 1)+1)
-CHARINDEX(' ', [Address], 1)-1)
, @DataFromUserInterFace
, 1)
!=0
OR
CHARINDEX
(
substring([Address], CHARINDEX(' ', [Address], CHARINDEX(' ', [Address], 1)+1)+1,
CHARINDEX(' ', [Address], CHARINDEX(' ', [Address], 1)+1)
-CHARINDEX(' ', [Address], CHARINDEX(' ', [Address], 1)-1)
)
, @DataFromUserInterFace
, 1)
!=0
Pavel Nefyodov
  • 876
  • 2
  • 11
  • 29
-1

If you normalize the address a bit the task would become easier or at least more accurate. Break address into it's own table like this:

 Create table Address(
       AddressKey    int not null ,
       addressnumber nvarchar(10), 
       street        nvarchar (50),
       streetSuffix  nvarchar (10)
       ... (city,state,zip,etc.)
       )

Your query can then specify what part of the address you want to search for similarities against.

Select * 
from table t inner join 
address a on a.addresskey=t.addresskey
Where a.street = 'pleasant'
      and a.streetnumber='177'

Note: I Left streetnumber as nvarchar to accommodate '23A' type stuff but even that could be broken out.

Oliver
  • 167
  • 8