2

The app I am writing deals with utility service addresses, and right now I am forcing the user to know enough to separate the parts of the address and put them in the appropriate fields before adding to the database. It has to be done this way for sorting purposes because a straight alphabetical sort isn't always right when there is a pre-direction in the address. For example, right now if the user wanted to put in the service address 123 N Main St, they would enter it as:

  • Street Number = 123
  • Pre-direction = N
  • Street Name = Main
  • Street Type = St

I've tried to separate this address into its parts by using the Split function and iterating through each part. What I have so far is below:

Public Shared Function ParseServiceAddress(ByVal Address As String) As String()
        'this assumes a valid address - 101 N Main St South
        Dim strResult(5) As String  '0=st_num, 1=predir, 2=st_name, 3=st_type, 4=postdir
        Dim strParts() As String
        Dim strSep() As Char = {Char.Parse(" ")}
        Dim i As Integer
        Dim j As Integer = 0
        Address = Address.Trim()
        strParts = Address.Split(strSep)  'split using spaces
        For i = 0 To strParts.GetUpperBound(0)
            If Integer.TryParse(strParts(i), j) Then
                'this is a number, is it the house number?
                If i = 0 Then
                    'we know this is the house number
                    strResult(0) = strParts(i)
                Else
                    'part of the street name
                    strResult(2) = strResult(2) & " " & strParts(i)
                End If
            Else
                Select Case strParts(i).ToUpper()
                    Case "TH", "ND"
                        'know this is part of the street name
                        strResult(2) = strResult(2) & strParts(i)
                    Case "NORTH", "SOUTH", "EAST", "WEST", "N", "S", "E", "W"
                        'is this a predirection?
                        If i = 1 Then
                            strResult(1) = strParts(i)
                        ElseIf i = strParts.GetUpperBound(0) Then
                            'this is the post direction
                            strResult(4) = strParts(i)
                        Else
                            'part of the name
                            strResult(2) = strResult(2) & strParts(i)
                        End If
                    Case Else
                        If i = strParts.GetUpperBound(0) Then
                            'street type
                            strResult(3) = strParts(i)
                        Else
                            'part of the street name
                            strResult(2) = strResult(2) & " " & strResult(i)
                        End If
                End Select
            End If
        Next i
        Return strResult
    End Function
I've found this method to be cumbersome, slow, and even totally wrong when given a wonky address. I'm wondering if what I'm trying to do here would be a good application for a regular expression? Admittedly I've never used regex in anything before and am a total newbie in that regard.

Thank you in advance for any help. :)

Edit - Seems more and more like I'm going to need a parser and not just regex. Does anyone know of any good address parser libraries in .NET? Writing our own is just not in the cards right now, and would be sent to the back burner if it came to that.

  • Is the predirection always one letter? – Samantha Branham Mar 16 '09 at 19:57
  • @Stuart B - No, sometimes people type them out like "123 South Main St" –  Mar 16 '09 at 19:59
  • @Heather - that definitely makes this difficult problem even more hairy! You will definitely have to have a defined list of acceptable predirections. – Samantha Branham Mar 16 '09 at 20:03
  • I wonder, do you really need the subfields? What is your reason not to just put it into a single string, especially since it seems that the users will type the same name differently anyway (even disregarding typos and the usual spelling horrors)? – Svante Mar 16 '09 at 20:10
  • @Svante - I need the sub-fields because an alphabetical sort on a single address field does not put the streets in the correct numerical order. Example - "1123 Main St" would appear before "12 Main St" in an ascending sort –  Mar 16 '09 at 20:23
  • @Stuart B - I know, it is a hairy problem. The powers that be really do not want me to tie the users hands in that regard but I may have to to make it right. If it were totally up to me I WOULD restrict it to a single letter pre-direction –  Mar 16 '09 at 20:24
  • A few potential problems: addresses with the direction after the street type; street numbers that contain text (rare, but it happens); addresses that contain unit/suite/apartment numbers; addresses that contain building names. – Scott Mar 16 '09 at 23:21

3 Answers3

1

I don't have a set of addresses to (easily) test against, but here is something to try at least. It may be too permissive in places or too restrictive in others, but you should be able to tweak it. You'll definitely need to tweak the list of predirections, but you will have to specify those explicitly. Also, be sure to set your regex options to be case-insensitive.

^(?<StreetNumber>[0-9]+)\s*(?<Predirection>(n)|(s)|(e)|(w)|(north)|(south)|(east)|(west))?\s+(?<StreetName>[a-z0-9 -'.]+)\s+(?<StreetType>[a-z.]+)$

In reality though, it would probably be better to delegate this to an address parser if possible, like the one NoahD suggested. You'll have to do some digging to find something for .NET probably, but if you can't find anything, then I would go with a regular expression for sure.

edit: do'h, \s, not /s

edit: changed regex for more semantic grouping. You can access the group values like so:

string address = "123 n main st";
Regex regex = new Regex("insert the regex above here", RegexOptions.IgnoreCase); 
MatchCollection matches = regex.Matches(address);

foreach(Match match in matches)
{
    string streetAddress = matches.Groups["StreetAddress"];
    string predirection = matches.Groups["Predirection"];
    string streetName = matches.Groups["StreetName"];
    string streetType = matches.Groups["StreetType"];
} 
Samantha Branham
  • 7,350
  • 2
  • 32
  • 44
  • Hmmm... I think I didn't quite understand previously what regex did. As you said, an address parser is probably what I need. Plugging this expression into .NET's Regex object worked very well to validate my input, so +1 on that account. Thank you for your help. :) –  Mar 16 '09 at 21:29
  • Actually, you can use regex to extract parts of a string. I sort of wrote this one sloppily, so it may be harder to know which groups to pull. Just do a google search for "C# Regex Groups" or something. – Samantha Branham Mar 16 '09 at 23:06
  • As for an address parser, I think geocoder.us has one. I don't know if you have to pay for it or not, though. – Samantha Branham Mar 16 '09 at 23:07
  • This actually helps me a lot, but doesn't totally solve the problem. I think resources are going to be allocated into better endeavors for the time being. Thank you again for taking the time to help me out with this. –  Mar 18 '09 at 00:48
0

You could do this in Perl using Geo::StreetAddress::US

For example:

  my $hashref = Geo::StreetAddress::US->parse_address(
                "1600 Pennsylvania Ave, Washington, DC" );
szabgab
  • 6,202
  • 11
  • 50
  • 64
NoahD
  • 8,092
  • 4
  • 27
  • 28
  • Too bad this is in VB.NET, because that is almost exactly what I am looking for. You don't happen to know if any parser libraries in .NET? –  Mar 16 '09 at 21:43
  • Actually, this might be the better thread: http://stackoverflow.com/questions/16413/parse-usable-street-address-city-state-zip-from-a-string – NoahD Mar 16 '09 at 21:51
0

Would using Geocoding from Google be appropriate for your app?

http://code.google.com/apis/maps/documentation/services.html#Geocoding_Structured

NoahD
  • 8,092
  • 4
  • 27
  • 28