Regular expression for address field validation

Question

I am trying to write a regular expression that facilitates an address, example 21-big walk way or 21 St.Elizabeth's drive I came up with the following regular expression but I am not too keen to how to incorporate all the characters (alphanumeric, space dash, full stop, apostrophe)

"regexp=^[A-Za-z-0-99999999'

This is a very vague purpose for a REGEX. What are the limitations - what characters are allowed/disallowed? An address could contain practically anything. Also, `0-99999` will have no effect as this is a character class - it matches one character at a time, so it should be simply `0-9`. — Mitya, Jul 12 '12 at 16:50
Regex is either too specific, or too loose for this purpose. You can only validate to see something **looks like** an address or not. — nhahtdh, Jul 12 '12 at 16:51
[Falsehoods programmers believe about addresses](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/) — Mark Rotteveel, Jul 31 '22 at 09:55

score 29 · Accepted Answer · edited Jan 17 '21 at 12:01

See the answer to this question on address validating with regex: regex street address match

The problem is, street addresses vary so much in formatting that it's hard to code against them. If you are trying to validate addresses, finding if one isn't valid based on its format is mighty hard to do. This would return the following address (253 N. Cherry St. ), anything with its same format:

\d{1,5}\s\w.\s(\b\w*\b\s){1,2}\w*\.

This allows 1-5 digits for the house number, a space, a character followed by a period (for N. or S.), 1-2 words for the street name, finished with an abbreviation (like st. or rd.).

Because regex is used to see if things meet a standard or protocol (which you define), you probably wouldn't want to allow for the addresses provided above, especially the first one with the dash, since they aren't very standard. you can modify my above code to allow for them if you wish--you could add

(-?)

to allow for a dash but not require one.

In addition, http://rubular.com/ is a quick and interactive way to learn regex. Try it out with the addresses above.

score 21 · Answer 2 · answered Sep 17 '13 at 15:26

21

In case if you don't have a fixed format for the address as mentioned above, I would use regex expression just to eliminate the symbols which are not used in the address (like specialized sybmols - &(%#$^). Result would be:

[A-Za-z0-9'\.\-\s\,]

answered Sep 17 '13 at 15:26

Serzas

1,016
8
8

Picky point, but # is in common use in the USA for apartment or suite number and should be allowed. – Chuck Krutsinger Jan 03 '19 at 19:43
1

This won't work if you put a letter or number after a symbol. This works though `!v.match(/[!@$%^&*(),?":{}|<>]/g)` – BamBam22 Nov 15 '19 at 14:57
1

Another Picky point, What if user is using German alphabet? For example this address is not a valid one: Hochstraße 77, 81541 München – Kasir Barati Apr 01 '23 at 01:41

score 11 · Answer 3 · edited May 23 '17 at 12:34

11

Just to add to Serzas' answer(since don't have enough reps. to comment). alphabets and numbers can effectively be replaced by \w for words. Additionally apostrophe,comma,period and hyphen doesn't necessarily need a backslash. My requirement also involved front and back slashes so \/ and finally whitespaces with \s. The working regex for me ,as such was :

pattern: "[\w',-\\/.\s]"

edited May 23 '17 at 12:34

Community

1
1

answered Feb 03 '17 at 10:00

oliver_48

129
1
4

Jayakumari Arumugham · Answer 4 · 2017-10-09T12:34:31.810

8

Regular expression for simple address validation

^[#.0-9a-zA-Z\s,-]+$

E.g. for Address match case

#1, North Street, Chennai - 11

E.g. for Address not match case

$1, North Street, Chennai @ 11

edited Oct 09 '17 at 12:34

answered Oct 06 '17 at 11:13

Jayakumari Arumugham

341
8
22

1

Please paste the text here instead of posting screenshots. – bfontaine Oct 06 '17 at 11:57
Sorry. Regular expression for simple address validation ^[#.0-9a-zA-Z\s,-]+$ E.g. for Address match case #1, North Street, Chennai - 11 and E.g. for Address not match case $1, North Street, Chennai @ 11 – Jayakumari Arumugham Oct 09 '17 at 11:13
Thanks. Please use the edit button to add this text in your answer. – bfontaine Oct 09 '17 at 12:15
i tried the same #1 and tried to enter the space but its not accepting – steve Oct 30 '18 at 06:49
yeah for me also its working there in rubular.com but i have used the same regex in my code export function specialAddressValidation(str) { return extract(str, '^[#.0-9a-zA-Z\s,-]+$') } but i am not getting – steve Oct 30 '18 at 07:30
1

This one is the winner. This will match the entire string or nothing if invalid characters are entered. – Ryan Walker Jan 15 '20 at 16:48

score 4 · Answer 5 · answered Aug 18 '13 at 12:12

I have succesfully used ;

Dim regexString = New stringbuilder
    With regexString
       .Append("(?<h>^[\d]+[ ])(?<s>.+$)|")                'find the 2013 1st ambonstreet 
       .Append("(?<s>^.*?)(?<h>[ ][\d]+[ ])(?<e>[\D]+$)|") 'find the 1-7-4 Dual Ampstreet 130 A
       .Append("(?<s>^[\D]+[ ])(?<h>[\d]+)(?<e>.*?$)|")    'find the Terheydenlaan 320 B3 
       .Append("(?<s>^.*?)(?<h>\d*?$)")                    'find the 245e oosterkade 9
    End With

    Dim Address As Match = Regex.Match(DataRow("customerAddressLine1"), regexString.ToString(), RegexOptions.Multiline)

    If Not String.IsNullOrEmpty(Address.Groups("s").Value) Then StreetName = Address.Groups("s").Value
    If Not String.IsNullOrEmpty(Address.Groups("h").Value) Then HouseNumber = Address.Groups("h").Value
    If Not String.IsNullOrEmpty(Address.Groups("e").Value) Then Extension = Address.Groups("e").Value

The regex will attempt to find a result, if there is none, it move to the next alternative. If no result is found, none of the 4 formats where present.

score 4 · Answer 6 · edited Mar 14 '19 at 13:47

4

This one worked for me:

\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\.?

The source: https://www.codeproject.com/Tips/989012/Validate-and-Find-Addresses-with-RegEx

edited Mar 14 '19 at 13:47

Abderrahim Soubai-Elidrisi

5,025
1
26
36

answered Feb 08 '18 at 19:02

Francisco Goldenstein

13,299
7
58
74

2

Add a flag to make it case insensitive. – Francisco Goldenstein Oct 12 '18 at 17:22
You could add |Place – Nico Jan 14 '19 at 22:16
Or you can use "123 main st".title() to appropriately capitalise all first letters – AER Nov 15 '19 at 05:48
`"\\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Lane|Road|Boulevard|Drive|Street|Ave|Dr|Rd|Blvd|Ln|St)\\.?"` – Anurag Sharma Sep 16 '20 at 07:52

score 2 · Answer 7 · edited May 23 '17 at 11:54

2

Regex is a very bad choice for this kind of task. Try to find a web service or an address database or a product which can clean address data instead.

Address validation using Google Maps API

edited May 23 '17 at 11:54

Community

1
1

answered Apr 03 '14 at 09:20

Aaron Digulla

321,842
108
597
820

score 1 · Answer 8 · edited Jan 17 '21 at 05:57

1

As a simple one line expression recommend this,

^([a-zA-z0-9/\\''(),-\s]{2,255})$

edited Jan 17 '21 at 05:57

June7

19,874
8
24
34

answered Jan 17 '21 at 05:29

SL BugBusters

11
1

Escape the hyphen, in a character class it defines a range. – Toto Jan 17 '21 at 11:20

score 1 · Answer 9 · answered Jul 31 '22 at 09:29

1

This one works well for me

^(\d+) ?([A-Za-z](?= ))? (.*?) ([^ ]+?) ?((?<= )APT)? ?((?<= )\d*)?$

Source : https://community.alteryx.com/t5/Alteryx-Designer-Discussions/RegEx-Addresses-different-formats-and-headaches/td-p/360147

answered Jul 31 '22 at 09:29

Andy

189
2
13

micah · Answer 10 · 2022-08-05T13:56:06.447

1

I needed

STREET # | STREET | CITY | STATE | ZIP

So I wrote the following regex

[0-9]{1,5}( [a-zA-Z.]*){1,4},?( [a-zA-Z]*){1,3},? [a-zA-Z]{2},? [0-9]{5}

This allows

1-5 Street #s

1-4 Street description words

1-3 City words

2 Char State

5 Char Zip code

I also added option , for separating street, city, state, zip

edited Aug 05 '22 at 13:56

answered Aug 05 '22 at 13:49

micah

838
7
21

score 0 · Answer 11 · answered Jul 11 '20 at 18:55

Here is the approach I have taken to finding addresses using regular expressions:

A set of patterns is useful to find many forms that we might expect from an address starting with simply a number followed by set of strings (ex. 1 Basic Road) and then getting more specific such as looking for "P.O. Box", "c/o", "attn:", etc.

Below is a simple test in python. The test will find all the addresses but not the last 4 items which are company names. This example is not comprehensive, but can be altered to suit your needs and catch examples you find in your data.

import re
strings = [
    '701 FIFTH AVE',
    '2157 Henderson Highway',
    'Attn: Patent Docketing',
    'HOLLYWOOD, FL 33022-2480',
    '1940 DUKE STREET',
    '111 MONUMENT CIRCLE, SUITE 3700',
    'c/o Armstrong Teasdale LLP',
    '1 Almaden Boulevard',
    '999 Peachtree Street NE',
    'P.O. BOX 2903',
    '2040 MAIN STREET',
    '300 North Meridian Street',
    '465 Columbus Avenue',
    '1441 SEAMIST DR.',
    '2000 PENNSYLVANIA AVENUE, N.W.',
    '465 Columbus Avenue',
    '28 STATE STREET',
    'P.O, Drawer 800889.',
    '2200 CLARENDON BLVD.',
    '840 NORTH PLANKINTON AVENUE',
    '1025 Connecticut Avenue, NW',
    '340 Commercial Street',
    '799 Ninth Street, NW',
    '11318 Lazarro Ln',
    'P.O, Box 65745',
    'c/o Ballard Spahr LLP',
    '8210 SOUTHPARK TERRACE',
    '1130 Connecticut Ave., NW, Suite 420',
    '465 Columbus Avenue',
    "BANNER & WITCOFF , LTD",
    "CHIP LAW GROUP",
    "HAMMER & ASSOCIATES, P.C.",
    "MH2 TECHNOLOGY LAW GROUP, LLP",
]

patterns = [
    "c\/o [\w ]{2,}",
    "C\/O [\w ]{2,}",
    "P.O\. [\w ]{2,}",
    "P.O\, [\w ]{2,}",
    "[\w\.]{2,5} BOX [\d]{2,8}",
    "^[#\d]{1,7} [\w ]{2,}",
    "[A-Z]{2,2} [\d]{5,5}",
    "Attn: [\w]{2,}",
    "ATTN: [\w]{2,}",
    "Attention: [\w]{2,}",
    "ATTENTION: [\w]{2,}"
]
contact_list = []
total_count = len(strings)
found_count = 0
for string in strings:
    pat_no = 1
    for pattern in patterns:
        match = re.search(pattern, string.strip())
        if match:
            print("Item found: " + match.group(0) + " | Pattern no: " + str(pat_no))
            found_count += 1
        pat_no += 1

print("-- Total: " + str(total_count) + " Found: " + str(found_count))

score 0 · Answer 12 · answered May 26 '21 at 14:42

0

UiPath Academy training video lists this RegEx for US addresses (and it works fine for me):

\b\d{1,8}(-)?[a-z]?\W[a-z|\W|\.]{1,}\W(road|drive|avenue|boulevard|circle|street|lane|waylrd\.|st\.|dr\.|ave\.|blvd\.|cir\.|In\.|rd|dr|ave|blvd|cir|ln)

answered May 26 '21 at 14:42

6opko

1,718
2
20
28

score 0 · Answer 13 · answered Aug 31 '21 at 05:15

I had a different use case - find any addresses in logs and scold application developers (favourite part of a devops job). I had the advantage of having the word "address" in the pattern but should work without that if you have specific field to scan

\baddress.[0-9\\\/# ,a-zA-Z]+[ ,]+[0-9\\\/#, a-zA-Z]{1,}

Look for the word "address" - skip this if not applicable
Look for first part numbers, letters, #, space - Unit Number / street number/suite number/door number
Separated by a space or comma
Look for one or more of rest of address numbers, letters, #, space

Tested against :

1 Sleepy Boulevard PO, Box 65745
Suite #100 /98,North St,Snoozepura
Ave., New Jersey,
Suite 420 1130 Connect Ave., NW,
Suite 420 19 / 21 Old Avenue,
Suite 12, Springfield, VIC 3001
Suite#100/98 North St Snoozepura

This worked for me when there were street addresses with unit/suite numbers, zip codes, only street. It also didn't match IP addresses or mac addresses. Worked with extra spaces. This assumes users are normal people separate elements of a street address with a comma, hash sign, or space and not psychopaths who use characters like "|" or ":"!

score 0 · Answer 14 · answered Oct 20 '21 at 13:56

0

For French address and some international address too, I use it.

[\\D+ || \\d]+\\d+[ ||,||[A-Za-z0-9.-]]+(?:[Rue|Avenue|Lane|... etcd|Ln|St]+[ ]?)+(?:[A-Za-z0-9.-](.*)]?)

answered Oct 20 '21 at 13:56

Jessé Filho

41
1
7

score 0 · Answer 15 · answered Nov 05 '21 at 23:59

I was inspired from the responses given here and came with those 2 solutions

support optional uppercase
support french also

regex structure

numbers (required)
letters, chars and spaces
at least one common address keyword (required)
as many chars you want before the line break

definitions:

accuracy

capacity of detecting addresses and not something that looks like an address which is not.

range

capacity to detect uncommon addresses.

Regex 1:

high accuracy
low range

/[0-9]+[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.)|(cir\.)|(blvd\.)|(hway\.)|(st\.)|(aut\.)|(ave\.)|(ln\.)|(rd\.)|(hw\.)|(dr\.)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i

regex 2:

low accuracy
high range

/[0-9]*[ |[a-zà-ú.,-]* ((highway)|(autoroute)|(north)|(nord)|(south)|(sud)|(east)|(est)|(west)|(ouest)|(avenue)|(lane)|(voie)|(ruelle)|(road)|(rue)|(route)|(drive)|(boulevard)|(circle)|(cercle)|(street)|(cer\.?)|(cir\.?)|(blvd\.?)|(hway\.?)|(st\.?)|(aut\.?)|(ave\.?)|(ln\.?)|(rd\.?)|(hw\.?)|(dr\.?)|(a\.))([ .,-]*[a-zà-ú0-9]*)*/i

Vijay Kesanupalli · Answer 16 · 2022-10-25T15:13:55.987

Here is my RegEx for address, city & postal validation rules

validation rules: address - 1 - 40 characters length. Letters, numbers, space and . , : ' #

city - 1 - 19 characters length Only Alpha characters are allowed Spaces are allowed

postalCode - The USA zip must meet the following criteria and is required: Minimum of 5 digits (9 digits if zip + 4 is provided) Numeric only A Canadian postal code is a six-character string. in the format A1A 1A1, where A is a letter and 1 is a digit. a space separates the third and fourth characters. do not include the letters D, F, I, O, Q or U. the first position does not make use of the letters W or Z.

address: ^[a-zA-Z0-9 .,#;:'-]{1,40}$

city: ^[a-zA-Z ]{1,19}$

usaPostal: ^([0-9]{5})(?:[-]?([0-9]{4}))?$

canadaPostal : ^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

score 0 · Answer 17 · answered Nov 05 '22 at 15:50

0

\b(\d{1,8}[a-z]?[0-9\/#- ,a-zA-Z]+[ ,]+[.0-9\/#, a-zA-Z]{1,})\n

answered Nov 05 '22 at 15:50

David P

71
3

score 0 · Answer 18 · answered Feb 10 '23 at 05:54

A more dynamic approach to @micah would be the following:

(?'Address'(?'Street'[0-9][a-zA-Z\s]),?\s*(?'City'[A-Za-z\s]),?\s(?'Country'[A-Za-z])\s(?'Zipcode'[0-9]-?[0-9]))

It won't care about individual lengths of segments of code.

https://regex101.com/r/nuy7hB/1

score 0 · Answer 19 · answered Apr 21 '23 at 04:32

A more thorough regex for the many suffixes used by the US Postal Service:

const addressRegex = /\d+[ ](?:[A-Za-z0-9.-]+[ ]?)+(?:Avenue|Alley|Anex|Arcade|Bayou|Beach|Bend|Bluff|Bluffs|Bottom|Branch|Brook|Bridge|Burg|Burgs|Bypass|Camp|Canyon|Cape|Causeway|Parkway|Pkwy|Center|Centers|Circle|Circles|Cliff|Cliffs|Club|Corner|Cove|Creek|Crescent|Crest|Crossing|XING|Dale|Dam|Divide|Estate|Expressway|Expy|Express|Fall|Falls|Ferry|Field|Fields|Forest|Fork|Freeway|Garden|Gardens|Gateway|Glen|Glens|Green|Grove|Harbor|Harbors|Haven|Heights|Highway|Hill|Hills|Hollow|Inlet|Island|Islands|Isle|Junction|Key|Keys|Quay|Knoll|Knolls|Lake|Lakes|Land|Landing|Lock|Locks|Lodge|Loop|Mall|Manor|Manors|Meadow|Meadows|Mews|Mill|Mills|Mission|Motorway|Mountain|Mountains|Neck|Orchard|Oval|Overpass|Park|Parks|Parkway|Pass|Passage|Path|Pike|Plaza|Port|Ports|Ramp|Ranch|Ridge|River|Route|Row|Run|Shoal|Shore|Shores|Skyway|Spring|Springs|Square|Squares|SQ|Station|Stravenue|STRA|Stream|Summit|SMT|Terrace|TRCE|Throughway|TRWY|Trail|TRL|Tunnel|TUNL|Turnpike|TPKE|Underpass|UPAS|Union|UN|Unions|UNS|Valley|VLY|Viaduct|VIA|Village|View|Views|VW|Ville|Vista|VL|VIS|Walk|Walks|Way|Ways|Well|Wells|WL|WLS|Lane|Road|Boulevard|Drive|Street|Place|Ave|Dr|Rd|Blvd|Ln|St)\.?/gi;

const test = "1234 Avalon Blvd, Los Angeles, CA 90011, United States";
const isMatch = test.match(addressRegex);
if (isMatch) 
  console.log(`"${test}" is detected as an address!`);
 else
  console.log(`"${test}" is NOT detected as an address!`);