Your assumptions about valid postcodes is slightly out
A valid UK postcode is always either one or two letters followed by a number (e.g. LE1, L12, etc.).
In its simplest terms the valid formats for a UK Postcode are:
+----------+---------------------------------------------+----------+
| Format | Coverage | Example |
+----------+---------------------------------------------+----------+
| AA9A 9AA | WC postcode area; EC1–EC4, NW1W, SE1P, SW1 | EC1A 1BB |
+----------+---------------------------------------------+----------+
| A9A 9AA | E1W, N1C, N1P | W1A 0AX |
+----------+---------------------------------------------+----------+
| A9 9AA | B, E, G, L, M, N, S, W | M1 1AE |
| A99 9AA | | B33 8TH |
+----------+---------------------------------------------+----------+
| AA9 9AA | All other postcodes | CR2 6XH |
| AA99 9AA | | DN55 1PT |
+----------+---------------------------------------------+----------+
Each of which you can define a pattern match for:
AA9A 9AA - [A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9A 9AA - [A-Z][0-9][A-Z] [0-9][A-Z][A-Z]
A9 9AA - [A-Z][0-9] [0-9][A-Z][A-Z]
A99 9AA - [A-Z][0-9][0-9] [0-9][A-Z][A-Z]
AA9 9AA - [A-Z][A-Z][0-9] [0-9][A-Z][A-Z]
AA99 9AA - [A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]
For something as re-usable as this, I think it is worth storing, so I would create a table for it:
CREATE TABLE dbo.SimplePostCodeValidation
(
PostCode VARCHAR(8) NOT NULL,
Pattern VARCHAR(50) NOT NULL
);
INSERT dbo.SimplePostCodeValidation (PostCode, Pattern)
VALUES
('AA9A 9AA', '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
('A9A 9AA', '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'),
('A9 9AA', '[A-Z][0-9] [0-9][A-Z][A-Z]'),
('A99 9AA', '[A-Z][0-9][0-9] [0-9][A-Z][A-Z]'),
('AA9 9AA', '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]'),
('AA99 9AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
-- REPEAT THE POSTCODES WITHOUT SPACES
('AA9A9AA', '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
('A9A9AA', '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]'),
('A99AA', '[A-Z][0-9][0-9][A-Z][A-Z]'),
('A999AA', '[A-Z][0-9][0-9][0-9][A-Z][A-Z]'),
('AA99AA', '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]'),
('AA999AA', '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]');
Now you can easily validate your postcodes:
DECLARE @T TABLE (Postcode VARCHAR(8));
INSERT @T (PostCode)
SELECT PostCode
FROM dbo.SimplePostCodeValidation
UNION ALL
SELECT PostCode
FROM (VALUES ('123456'), (''), ('TEST')) t (PostCode);
SELECT t.PostCode, IsValid = CASE WHEN pc.PostCode IS NULL THEN 0 ELSE 1 END
FROM @T AS t
LEFT JOIN SimplePostCodeValidation AS pc
ON t.PostCode LIKE pc.Pattern;
Which returns:
PostCode IsValid
----------------------
AA9A 9AA 1
A9A 9AA 1
A9 9AA 1
A99 9AA 1
AA9 9AA 1
AA99 9AA 1
123456 0
0
TEST 0
To apply this to your situation you would use:
SELECT CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END,
TotalValue = SUM(totalValue)
FROM Customer AS c
LEFT JOIN SimplePostCodeValidation AS pc
ON t.PostCode LIKE pc.Pattern;
GROUP BY CASE WHEN pc.PostCode IS NULL THEN 'Invalid' ELSE LEFT(c.postcode, 2) END;
If you want to get more complicated, there are actually further limitations to what is a valid postcode, e.g. if it is the pattern A9 9AA
then the first letter can only be one of (B, E, G, L, M, N, S, W). The guidelines set out on wikipedia state:
- Areas with only single-digit districts: BR, FY, HA, HD, HG, HR, HS, HX, JE, LD, SM, SR, WC, WN, ZE (although WC is always subdivided by a further letter, e.g. WC1A).
- Areas with only double-digit districts: AB, LL, SO.
- Areas with a district '0' (zero): BL, BS, CM, CR, FY, HA, PR, SL, SS (BS is the only area to have both a district 0 and a district 10).
- The following central London single-digit districts have been further divided by inserting a letter after the digit and before the space: EC1–EC4 (but not EC50), SW1, W1, WC1, WC2, and part of E1 (E1W), N1 (N1C and N1P), NW1 (NW1W) and SE1 (SE1P).
- The letters QVX are not used in the first position.
- The letters IJZ are not used in the second position.
- The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
- The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
- The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
- Post code sectors are one of ten digits: 0 to 9 with 0 only used once 9 has been used in a post town, save for Croydon and Newport (see above).
Since SQL Server does not support full regex it gets a bit more complicated to account for all these caveats. If you really wanted fool proof validation I would be inclined to use the regex from answers this question, and use a CLR function to validate the postcode.