Can't figure out how to use expressions to validate a Canadian postal code in Python

Question

I'm trying to make a program that takes a postal code input from the user and checks to see if it's valid. So far I have:

postalCode = input("Postal code: ")

postalCode = postalCode.replace(" ", "")

postalCode = postalCode.lower()

letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
numbers = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]

valid = True

for i in range(0, len(postalCode), 2):
  if postalCode[i] not in letters or postalCode[i+1] not in numbers:
  valid = False
  break

if(valid):
  print("Valid postal code.")
else:
  print("Not a valid postal code.")

The code runs fine, but I know using expressions would be much more viable but I haven't been able to figure out how they work.

The Canadian postal code format is: L/N/L N/L/N

Thanks

You mean using regular expressions I think? `import re len(re.findall(pattern, string_to_search))` make the pattern for the postal code then see if `len` greater than 1 should work. — it's-yer-boy-chet, Dec 13 '17 at 21:43
https://stackoverflow.com/questions/15774555/efficient-regex-for-canadian-postal-code-function — it's-yer-boy-chet, Dec 13 '17 at 21:46
NOT a duplicate of either [canadian-postal-code-validation-python-rege](https://stackoverflow.com/questions/29906947/canadian-postal-code-validation-python-regex) nor [JS: efficient-regex-for-canadian-postal-code-function](https://stackoverflow.com/questions/15774555/efficient-regex-for-canadian-postal-code-function). The second is javascript and uses forbidden characters, the first is not an accepted answer (but a valid regex solution) — Patrick Artner, Dec 13 '17 at 22:14

Patrick Artner · Answer 1 · 2021-07-16T05:15:57.497

No regex Solution:

Get your facts straight - a-z is wrong, some letters are omitted due to similarity:

A Neufundland               B Nova Scotia           C Prince Edward Island
E New Brunswick             G Québec-Ost            H Montréal und Laval
J Québec-West               K Ontario-Ost           L Ontario-Mitte
M Groß-Toronto              N Ontario-Südwest       P Ontario-Nord
R Manitoba                  S Saskatchewan          T Alberta
V British Columbia          X NW-Territ. Nunavut    Y Yukon

Code:

def CheckCanadianPostalcodes(p, strictCapitalization=False, fixSpace=True):
    '''returns a Tuple of (boolean, string):
    - (True, postalCode) or 
    - (False, error message) 
    By default lower and upper case characters are allowed,  
    a missing middle space will be substituted.'''

    pc = p.strip()                   # copy p, strip whitespaces front/end
    if fixSpace and len(pc) == 6:
        pc = pc[0:3] + " " + pc[3:]    # if allowed / needed insert missing space

    nums = "0123456789"              # allowed numbers
    alph = "ABCEGHJKLMNPRSTVWXYZ"    # allowed characters (WZ handled below)
    mustBeNums = [1,4,6]             # index of number
    mustBeAlph = [0,2,5]             # index of character (WZ handled below)

    illegalCharacters = [x for x in pc 
                         if x not in (nums + alph.lower() + alph + " ")]

    if strictCapitalization:
        illegalCharacters = [x for x in pc
                             if x not in (alph + nums + " ")]

    if illegalCharacters:
        return(False, "Illegal characters detected: " + str(illegalCharacters))

    postalCode = [x.upper() for x in pc]          # copy to uppercase list

    if len(postalCode) != 7:                      # length-validation
        return (False, "Length not 7")

    for idx in range(0,len(postalCode)):          # loop over all indexes
        ch = postalCode[idx]
        if ch in nums and idx not in mustBeNums:  # is is number, check index
            return (False, "Format not 'ADA DAD'")     
        elif ch in alph and idx not in mustBeAlph: # id is character, check index
            return (False, "Format not 'ADA DAD'") # alpha / digit
        elif ch == " " and idx != 3:               # is space in between
            return (False, "Format not 'ADA DAD'")

    if postalCode[0] in "WZ":                      # no W or Z first char
        return (False, "Cant start with W or Z")

    return (True,"".join(postalCode))    # yep - all good

Testing:

testCases = [(True,"A9A 9A9"), (True,"a9a 9a9"), (True,"A9A9A9"),
             (True,"a9a9a9"), (False,"w9A 9A9"), (False,"z9a 9a9"), 
             (False,"a9a 9!9")]

for t in testCases:
    pc = CheckCanadianPostalcodes(t[1])    # output differs, see func description
    assert pc[0] == t[0], "Error in assertion: " + str(t) + " became " + str(pc)
    print(t[1], " => ", pc)

pp = input("Postal code: ") 
print(CheckCanadianPostalcodes(pp))    # output differs, see func description

Output:

A9A 9A9  =>  (True, 'A9A 9A9')
a9a 9a9  =>  (True, 'A9A 9A9')
A9A9A9  =>  (True, 'A9A 9A9')
a9a9a9  =>  (True, 'A9A 9A9')
w9A 9A9  =>  (False, 'Cant start with W or Z')
z9a 9a9  =>  (False, 'Cant start with W or Z')
a9a 9!9  =>  (False, "Illegal characters detected: ['!']")
Postal code: b2c3d4
(False, "Illegal characters detected: ['d']")

This answer with regex (not accepted) delivers the correct regex.

Number of possible postal codes (from wikipedia)

Postal codes do not include the letters D, F, I, O, Q or U, and the first position also does not make use of the letters W or Z. [...] As the Canada Post reserves some FSAs for special functions, such as for test or promotional purposes, (e.g. the H0H 0H0 for Santa Claus, see below) as well as for sorting mail bound for destinations outside Canada. [...]

which leaves you with ABCEGHJKLMNPRSTVXY without WZ as 1st char.

Edit: Incoperated change suggestion by jl-peyret

allow missing space
and make clearer when upper/lowercase is ok

Very nice. +1. 2 remarks: it's not obvious that you accept lowercase and I think it would also be nice to allow ADADAD, without the space. In both cases, they would have to be converted back to ADA DAD at some point, but it's reasonable not to reject user input based on capitalization or on a missing, but entirely deducible, space. — JL Peyret, Dec 15 '17 at 18:06
@JLPeyret thanks for the suggestion, adapted the code to work on default with upper/lower and missing space. Both are fixed in the result. You are a canadian citizen so your wish regarding this is my command ;) — Patrick Artner, Dec 15 '17 at 22:59

score 2 · Answer 2 · edited Dec 16 '17 at 08:07

2

Based on your question you could use:

import re

postalCode = input("Postal code: ")


pattern = re.match(r'[A-Z]{1}[0-9]{1}[A-Z]{1}\s[0-9]{1}[A-Z]{1}[0-9]{1}',postalCode)

if pattern:

print('Valid postal code')

else:

print('Invalid postal code')

You could also use the sub method and get the sequence so you don't have to repeat the code as I did above.

edited Dec 16 '17 at 08:07

Patrick Artner

50,409
9
43
69

answered Dec 13 '17 at 22:29

Franndy Abreu

186
2
12

The regex is correct based on the information provided in the question - unfortunately not all A-Z are allowed You are using the wrong regex for validation - your regex is not case-insensitive, the code in the question is case insensitive – Patrick Artner Dec 16 '17 at 07:55

Bill Bell · Answer 3 · 2017-12-16T15:53:53.413

I began by constructing two strings, one that contains the alphabetic characters that may be used in any (legal) position in a postal code, and one string that contains the alphabetic characters that must be used for the first position.

>>> any_position = 'ABCEGHJKLMNPRSTVWXYZ'
>>> first_position = 'ABCEGHJKLMNPRSTVXY'

These few lines of code display the regular expression and its performance against a few trial examples. If <_sre.SRE_Match object; ... does not appear under an invocation of the regex then that means that the test failed for one reason or another.

Edit: I should have explained what the regex does.

The caret ('^') character at the beginning is there to insist that the match begin with the first character in the subject string. Likewise, the dollar ('$') insists that the match ends with the last character in the subject string (ie, the postal code).
[ABCEGHJKLMNPRSTVXY] 'accepts' any single character within this set of characters.
[0-9] 'accepts' any single character within this set of characters.
The blank 'accepts' a blank character.
And so on. Taken together, these specifications constitute a single Canadian postal code, in upper case.

Postal codes containing lower-case alphabetic characters are acceptable, if you can arrange to convert the lower-case to upper-case. If you want to accept them then, as Patrick Artner suggests, add re.I or re.IGNORECASE as a parameter to the match statement.

>>> import re
>>> postal_code_re = re.compile(r'^[ABCEGHJKLMNPRSTVXY][0-9][ABCEGHJKLMNPRSTVWXYZ] [0-9][ABCEGHJKLMNPRSTVWXYZ][0-9]$')
>>> postal_code_re.match('H0H 0H0')
<_sre.SRE_Match object; span=(0, 7), match='H0H 0H0'>
>>> postal_code_re.match('A0A 0A0')
<_sre.SRE_Match object; span=(0, 7), match='A0A 0A0'>
>>> postal_code_re.match('W0A 0A0')
>>> postal_code_re.match('Q0A 0A0')
>>> postal_code_re.match('H0H 0Q0')

It might be important to mention that this approach tests only the format of a code. It's not sufficient to test its validity since many, many codes are not in use. For small volume testing one could check whether a code is in actual use, or even whether it's in a valid format, using one of the tools at https://www.canadapost.ca/web/en/pages/tools/default.page with web scraping techniques.

the shortness of the regex solution makes mine look a tad long :) +1 - but at least I'll output some error hints. — Patrick Artner, Dec 16 '17 at 07:58
the question used case insensitige though - you could add the re.flag for that — Patrick Artner, Dec 16 '17 at 08:04
@PatrickArtner: Now I look back at the question I think the principal flaw in my answer is that I didn't explain how this regex works, which is what that OP was uncertain about. As often happens here on SO, who knows what the SO intends doing with any results we might offer? — Bill Bell, Dec 16 '17 at 15:41

Can't figure out how to use expressions to validate a Canadian postal code in Python

3 Answers3