-1

I have found C# & Perl libraries that look good for Street Address parsing.

https://usaddress.codeplex.com/

http://search.cpan.org/~timb/Geo-StreetAddress-US-1.04/US.pm

Is there ColdFusion code -or- something that can be run from CF to do the same job?

  • http://gis.stackexchange.com/questions/26501/can-i-use-the-google-geocoding-api-to-parse-and-standardize-address-data – Henry Mar 31 '16 at 19:54
  • Is this really a duplicate question? He asked if there are any CF/Java libraries that could parse addresses whereas the other question is simply "how do I parse it". The responses on the other question don't provide any java parsing library recommendations. (I found 2 and listed them below.) – James Moberg Apr 02 '16 at 00:07

3 Answers3

0

If you need to parse & validate, you could use the SmartyStreets.com LiveAddress API. A ColdFusion CFC from 2011 is available that will connect, validate the address and return a JSON struct of a lot more location-based data for that address.

http://smartystreets.riaforge.org/

SmartyStreets also has javascript Address Autocomplete API so you can include it on any web-based form to ensure that incoming addresses are validated before you ever save it to your back-end database (including the identification & full separation of address parts).

Check out the features... it's more robust than any offline library:

https://smartystreets.com/features

The only downside is that is a paid service and you can only process 250 addresses for free each month:

https://smartystreets.com/free-address-verification

In addition to using the CFC, I use separate wrapper for SmartyStreet's "Smartylist" (a command-line tool). It basically uploads a CSV file and returns the same file with additional columns with the original submitted data.

https://smartystreets.com/docs/smartylist/command-line-tool

James Moberg
  • 4,360
  • 1
  • 22
  • 21
0

"Address Cleaner" is a ColdFusion ColdBox plugin that uses JGeoCoder.

https://github.com/angelseye/Address-Cleaner

http://jgeocoder.sourceforge.net/

Once the JAR file is installed, you can call the CFC using the following methods and it will return matches to the following keys CITY, LINE2, NUMBER, PREDIR, STATE, STREET, TYPE, ZIP. (NOTE: You'll need to modify the CFC if you aren't using the Coldbox framework.):

clearAddressString = addressCleaner.cleanAddress(FullAddressString);

addressStruct = addressCleaner.getAddressStruct(FullAddressString);

Another java option could be used with ColdFusion is "International Address Parser". (Multiple country packs are available, but no prices are listed.)

http://www.address-parser.net/product-java.php

http://address-parser.net/documentation/documentation.php#java

James Moberg
  • 4,360
  • 1
  • 22
  • 21
  • This works, but it does not handle apartment number & suite. I tested out JGeocode, it does not work as good as https://github.com/datamade/usaddress example: https://parserator.datamade.us/usaddress It is python, I can either call it from Jython or write it as a service. Thanks for the help! – Richard Hughes Apr 02 '16 at 23:24
  • @RichardHughes Address-Parser.net charges ~3,000 € for a single country or 9,000 € for all countries. Their evaluation version is only capable of parsing a single, predetermined address. I've also found Windows COM component that seems more capable and can additionally parse name, phone, email & website from a text string. http://www.address-parser.com/ (I've requested a price quote for a small project.) They also have a web API, but no advertised rates. – James Moberg Apr 04 '16 at 18:33
-1

I have written a preliminary parser of address and zip. Here is a snippet to get you started (https://bitbucket.org/snippets/mrrobwad/KR9Mg):

<cfset loc.SOURCETEXT = "123 ABC Street, Any Town, MA 12345">

<cfset loc.array_zip_code = REMatchNoCase("[-:.\s][0-9]{5}([-][0-9]{4})?",loc.SOURCETEXT)>
<cfif ArrayLen(loc.array_zip_code) GT 0>
    <cfset loc.ZIP = REReplaceNoCase(loc.array_zip_code[1],"[-:.\s]","","All")>
    <b>ZIP:</b>
    <cfdump var="#loc.ZIP#">
</cfif>
<br><br>
<cfset loc.street_types = "STREET|ST|DRIVE|DR|AVENUE|AVE|ROAD|RD|LOOP|COURT|CT|CIR|CIRCLE|LANE|LN|BOULEVARD|BLVD">
<cfset loc.array_street_type = REMatchNoCase("(\s)+(#loc.street_types#)[^a-zA-Z]",loc.SOURCETEXT)>
<cfif ArrayLen(loc.array_street_type) GT 0>
    <cfset loc.street_address_trimmed = Left(loc.SOURCETEXT,FindNoCase(loc.array_street_type[1],loc.SOURCETEXT)+Len(loc.array_street_type[1]))>
    <cfset loc.array_street_address = REMatchNoCase("(\d{1,5})+(\s([a-zA-Z])+)+",loc.street_address_trimmed)>
    <cfif ArrayLen(loc.array_street_address) GT 0>
        <cfset loc.ADDRESS = loc.array_street_address[1]>
        <b>ADDRESS:</b>
        <cfdump var="#loc.ADDRESS#">
    </cfif>
</cfif>
Robert Waddell
  • 879
  • 7
  • 15
  • It's close. The script (original posted on 3/31/2016) didn't identify the zipcode. (The address ended with a valid US 5 digit zip code.) I separated the address & city using a comma, and the comma was included in the `array_street_type` & `street_address_trimmed` results. – James Moberg Mar 31 '16 at 21:25
  • That's interesting. Can you provide sample text you are extracting from? – Robert Waddell Apr 01 '16 at 14:07
  • Remove the extra unnested `` at the end. Try this address `123 ABC Street, Any Town, MA 12345`. The street values end with `,` and the zip code isn't detected. Sample format used here http://englishplus.com/grammar/00000085.htm and in many other places. (`Post Office Box 203, Shelton, CT 06484` returns the state as the street.) – James Moberg Apr 01 '16 at 17:08
  • Fixed nesting and updated example code to dump results. I just realized I wrote this code to specifically match zipcodes starting with 2, so I have updated this example to match zips starting with any number. Also, this was built to handle residential addresses and not PO Boxes but you could follow similar logic to extend this to match PO Boxes. – Robert Waddell Apr 01 '16 at 20:58
  • I didn't specifically choose the PO box. It was a sample address format in the link I found (regarding the use of commas and spaces). PO boxes are pretty common. (Disclaimer: JGeoCoder doesn't parse this correctly either, but "International Address Parser" does. (SmartyStreet API does too in addition to verifying it.) – James Moberg Apr 01 '16 at 21:04
  • Considering the variation across valid addresses, if you are going to build your own parser I think you need to aim for a 'lowest common denominator' success rate that works for your application. The sample code I shared works for my application/purposes and thought it might be helpful to the OP. I don't presume to think my few lines of code can handle all address variations, but it is a simple approach that can be tweaked to meet constrained use cases. If the OP truly needs to parse ANY valid address, then a mature library or API needs to be used. BTW thanks James for your engaged feedback! – Robert Waddell Apr 02 '16 at 21:33