0

I am trying to write Regex for New Zealand address validation. This is the valid character set which I want to capture, must start with a number and case insensitive which includes letters A to Z, numbers (0-9) and hyphen "-" and forward slash "/" as well as Maori accented characters for Maori vowels ā, ē, ī, ō, ū and works in JavaScript to display an invalid error message, just not with the HTML5 form validation.

...

// JavaScript regex
var regex = /^\d[\/a-zĀ-ū0-9\s\,\'\-]*$/i;

...

Because I am attempting to do this in BigCommerce and don't have access to edit the input I am applying the "pattern" HTML input attribute with JavaScript. I really did think it was as simple as stripping "/^" from the start of the regex and "$/" from the end of the regex when applying to the HTML pattern attribute:

...

/** @start JavaScript code for HTML5 form validation **/ 
let fulladdress = document.getElementById('addressLine1Input');

fulladdress.setAttribute("pattern", "\d[\/a-zĀ-ū0-9\s\,\'\-]*");

fulladdress.addEventListener('input', () => {
  fulladdress.setCustomValidity('');
  fulladdress.checkValidity();
});

fulladdress.addEventListener('invalid', () => {
  fulladdress.setCustomValidity('No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive');
});

/** @end JavaScript code for HTML5 form validation **/

...

HTML snippet:

...

<input id="addressLine1Input" name="shippingAddress.address1" placeholder="Enter your address" onFocus="geolocate()" type="text" class="form-control" onblur="validateAddress()" required>

...

I created a JSFiddle, the lines of interest are 13 - 26 on the JavaScript area JSFiddle example

This is an invalid address string:

Flat 1 311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

This is a valid address string:

1/311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

The form validation pops up once you enter an address and click the Submit button

Thank you really appreciate the input from the community.

This code works perfectly for all the validation examples I want, if there is a way to use it with HTML5 tool tips and form validation that would serve as a very viable workaround:

var regex = /^.*(po\s*box|private\s*bag).*$|^\d[\/a-zĀ-ū0-9\s\,\'\-]*$/i;

...

function validateAddress() {
  var str = getValue();
  var match = str.match(regex);
  var tooltip = document.getElementById("notification");
  var msg = document.getElementById("msg");

  if (match && !match[1]) {

    // valid address
    msg.innerHTML = "<p>Address looks to be valid</p>";
    tooltip.style.display = 'none';

  } else {

    // invalid address
    msg.innerHTML = "<p>Invalid address (No PO Box or Private Bag address must start with a number, e.g. 1/311 Canaveral Drive)</p>";
    tooltip.style.display = 'block';

  }
}

...
TylerH
  • 20,799
  • 66
  • 75
  • 101
  • try this "\\d[\/a-zĀ-ū0-9\\s\\,'\\-]*" i think it will work. – Bappi Saha Sep 08 '22 at 04:39
  • 1
    Remember that ```\``` in a string is an escape character, so if you want the literal character ```\``` you need to escape that with.. well, itself: `const re = /\d/; const str = "\\d";` – Mike 'Pomax' Kamermans Sep 08 '22 at 04:39
  • Thanks Bappi, I tried changing the regex for the HTML5 pattern to your suggestion, it allowed submission for both valid and invalid address strings, definitely progress just the invalid one should display the tool tip and the valid one should allow submission of the form see https://jsfiddle.net/jeremy_tactical/ef864oqp/384/ Line 16 of the JavaScript – Jeremy Leys Sep 08 '22 at 04:50
  • `fulladdress.setAttribute("pattern", "\\d[/a-zĀ-ū0-9\\s',-]*");` should work better since, as mentionned above, you have to escape your backslash and you don't need to escape `,` and `-` as they don't have any meaning in a regular expression. With the browser inspector, just go and check the *pattern* attribute of the input field and you'll see it should then be `pattern="\d[/a-zĀ-ū0-9\s',-]*"` as you wish. But I had a play with your JSFiddle and the Google autocompletion doesn't put the number at the beginning, leading to an invalid address. The regex might not be permissive enough :-/ – Patrick Janser Sep 08 '22 at 06:44
  • Oh, by the way, you are missing `[A-Z]` as they'll be some uppercase letters. But to be honnest, your regex pattern will never help you validating the address. I actually can validate just `32` or `1,/a-asdf` which are really not valid. I think that your Google Maps API will be a far safer way to validate that your address is in New Zealand. Thanks for sharing your code! It's a nice discovery for me :-) – Patrick Janser Sep 08 '22 at 07:00
  • @PatrickJanser thank you for your input, what would you suggest in terms of regex, happy to test and take solutions advice. In terms of google validation I am not that familia with that other the options provided by autocomplete which are too versatile for NZ postal service compliance, this is why I wanted to evaluate with regex to prompt the user to re-enter correctly. For instance "1/311", "311/1", "1-311", "311-1" are all valid "Flat 1 311" is invalid. I am not worried about "1,/a-asdf" as all the address components are required and the user generally selects the address via autocomplete. – Jeremy Leys Sep 08 '22 at 22:22
  • @PatrickJanser your suggest regex ```fulladdress.setAttribute("pattern", "\\d[/a-zĀ-ū0-9\\s',-]*");``` returns invalid for both the valid and invalid address string example I provided in my question. Thank you for giving it a go, think I still need to do more testing and trying other options. – Jeremy Leys Sep 08 '22 at 22:28
  • "PO Box" and "Private Bag" sub-strings are always the first sub-string at the beginning of an address string so requiring the first sub-string to be a number in the HTML pattern largely eliminates that anyway. In my JavaScript validation to show an error message I had more of a comprehensive regex for those that are interested, one option would be to invoke the HTML5 UI for form validation, just not sure how to do that, as that method for validation works perfectly. You can see the logic for that on line 32. https://jsfiddle.net/jeremy_tactical/ef864oqp/389/ – Jeremy Leys Sep 08 '22 at 22:38
  • My more comprehensive regex is ```var regex = /^.*(po\s*box|private\s*bag).*$|^\d[\/a-zĀ-ū0-9\s\,\'\-]*$/i``` – Jeremy Leys Sep 08 '22 at 22:39

2 Answers2

0

@PatrickJanser you legend, how do I award this to you mate? And thanks everyone else. Patrick you are right it was missing [A-Z]. which worked fine in the pure JS version because it had the "/i" for being case insensitive e.g.

...

var regex = /^.*(po\s*box|private\s*bag).*$|^\d[\/a-zĀ-ū0-9\s\,\'\-]*$/i;

...

The answer for the HTML pattern is

...

fulladdress.setAttribute("pattern", "\\d[/a-zA-ZĀ-ū0-9\\s',-]*");

...

As seen in JSFiddle JSFiddle

When testing with:

This is an invalid address string:

Flat 1 311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

This is a valid address string:

1/311 Point Chevalier Road, Point Chevalier, Auckland 1022, New Zealand

  • Thanks Jeremy! I'm glad I could help a bit. I didn't have the time to completely look at your sexy JSFiddle but I just noticed that the JS *regex* var and the *pattern* attribute of the field are not the same. Shouldn't the "po box" and "private bag" also be in the *pattern* attribute? I was also wondering why "po box" would be allowed without the rest of an address. In fact, I think I would have tried to use Google's autocomplete to extract the address parts (even if the number isn't at the beginning) in order to fill all your specific fields. Then these fields could be validated on submit. – Patrick Janser Sep 09 '22 at 07:24
  • [Your Google autocomplete stopped working since revision 370](https://jsfiddle.net/jeremy_tactical/ef864oqp/370) of your JSFiddle. [It was ok on version 369](https://jsfiddle.net/jeremy_tactical/ef864oqp/369). Did you disable it or did some changes break it? I tried to enter "Pully, senalèche 27" (which is my address in Switzerland) and then clicked on the Google's autocomplete item. It fills all the fields nicely and I found that just perfect! You then only need to validate with the *pattern* attributes that a street number or postal number are effectively digits. – Patrick Janser Sep 09 '22 at 07:30
  • @PatrickJanser this was the final solution I came up with that I have now deployed to production, thank you for your input https://stackoverflow.com/questions/73710326/how-to-create-your-own-custom-google-address-autocomplete-in-bigcommerce-one-ste/73808022#73808022 – Jeremy Leys Sep 27 '22 at 20:45
0

After the comments we have exchanged, I think there are several points to discuss.

A regular expression to validate an address may get complex

I am not really convinced that a regular expression can be used for the field in which the user will type his address with the Google autocomplete feature. Indeed, there are many cases to consider.

Let's take in consideration the fact that you would like to use the pattern attribute on the field itself and also use a JavaScript regular expression.

The pattern attribute is already matching the full input value. In fact, it's automatically wrapped between ^(?: and )$. The parenthesis are there to avoid changing the behaviour of the | operator. It's transparent for us and we cannot use the /patter/modifiers syntax like in /[a-z]/i.

So, as explained above, unfortunately, the pattern attribute doesn't accept regex flags. And the stupid thing is that JS still doesn't accept inline modifiers such as (?i) to turn on the case-insensitive flag. This means that we cannot turn on the u = unicode flag either. This would have been great since the unicode flag lets you use \p{L} to match any char of any language, such as à, ã or é. The fact is that \w is equivalent to [a-zA-Z0-9_] so it will be ok for english letters but not for your Ā which you mentionned.

Now, if we use [\wĀ-ū] then we will actually match a bunch of letters between 256 and 363, including some like ŁĦŘ that I think you don't want. This is where unicode and \p{...} classes would help writing the regex but this would only work in pure JS and not in the pattern attribute.

An address can contain a building name such as:
Totârä Farm, 2/12543 Farm Road, RD 1, Outram 9073

The user could be staying by someone, thus prefixing the address with c/o:
c/o James Bond, 007 Agent Street, London, Greater London, SW1A 2AA, United Kingdom.

I found these examples of addresses on the New Zealand Post and they have to be valid too.

Let's have a try with your pattern: https://regex101.com/r/R8Bjy4/1

We see that it's not really bullet proof and this is why I don't think it will be as easy as we could think. This is why I think that using Google's autocomplete and then validating the exploded address components would probably be easier.

But for the exercice, let's try with a regex...

  • Matching any letter is more complicated without the unicode flag. But if we look at the unicode tables we see that we can add some ranges:

    This leads to [\/\wÀ-ɏḀ-ỹ .,'#-] in order to accept the slash, any letter or digit, almost all latin letters, simple space (not the same as \s which includes new lines and tabs), dot, comma, single quote, hashtag and the hypen.

    I saw that you often wrote [\,\'\-] in your pattern. In fact it's not necessary to escape chars inside a class of chars [...] except for the "]" char and for ones that have a meaning like \s, \n or \d. The "." or "|" chars outside a group of chars should effectively be escaped as \. and respectively \| but if they are inside a class of chars then you can write them directly. Example: to match any char of ".-,?|]" then you'll use [.,?|\]-] as pattern. The hyphen is used for ranges. So if you want to match it then you have to put it at the beginning or end of the class: [-\w] and [\w-] are equivalent and both match any letter, digit, underscore and hyphen. In JS, the slash is used to delimit the pattern from the flags so you have to escape it anyway if you have it outside or inside a character class.

  • The PO Box or Private Bag:

    1. Make it case-insensitive without the i flag and replace \s by since we don't want to match a newline or a tab:
      PO Box => [pP][oO] +[bB][oO][xX]
      Private Bag => [pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]
    2. Add a mandatory number (which we capture for debug purpose) and then match anything else after it:
      ^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(\d+).*
  • Some street numbers contain letters or slashes but they must start with a digit: (\d[\/\w]*)

  • In front of the street number they may be a building name:

    1. We'll assume that the building name itself doesn't contain a comma. So it could be almost any char but not a comma: [\/\wÀ-ɏḀ-ỹ .'-]*
    2. It's then followed by a comma and probably some spaces. All of it is optional and it must be at the beginning of the address: ^([\/\wÀ-ɏḀ-ỹ .'-]*, *)?

Putting it together:

  • The PO Box or Private Bag:

    ^([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(\d+).*
    
  • The standard address with at least a number and an optional building prefix:

    ^([\/\wÀ-ɏḀ-ỹ .'-]*, *)?(\d[\/\w]*)[\/\wÀ-ɏḀ-ỹ .,'#-]*$
    

Testing it: https://regex101.com/r/P4bEVf/6

Ok, it's working but it's accepting to many invalid entries. As you see, it's difficult to get something bullet proof... Yes, we can improve the regex but I don't think it will be easy!

And trying the same regex in the pattern attribute:

let address = document.getElementById('address');
let log_ul = document.getElementById('log');
let submit = document.getElementById('submit');  

document.getElementById('demo-form').addEventListener('submit', (e) => {
  let li = document.createElement('li');
  li.textContent = address.value;
  log_ul.appendChild(li);
  e.preventDefault();
});
input[type="text"] {
  min-width: 30em;
}
<form id="demo-form" action="">
  <input type="text" id="address" name="address"
         placeholder="Type your full address here"
         pattern="([pP][oO] +[bB][oO][xX]|[pP][rR][iI][vV][aA][tT][eE] +[bB][aA][gG]) +(\d+).*|([\/\wÀ-ɏḀ-ỹ .'-]*, *)?(\d[\/\w]*)[\/\wÀ-ɏḀ-ỹ .,'#-]*"
         title="Address format: '45 Street Name, 2000 City, Country' or 'PO Box 2365, City'" />
  <input type="submit" value="submit" id="submit">
</form>
<ul id="log">
</ul>
Patrick Janser
  • 3,318
  • 1
  • 16
  • 18