1

I am having problems with the following UK Postcode regex

([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})

It works for UK postcodes as intended e.g.

AB11AB

However, it also seems to match UUIDs as well e.g.

c25d4f64-2336-4a5d-b94c-14dc12xxxa58

Is there anyway to ignore UUIDs from the regular expression ?

Please find example here

https://regex101.com/r/dI6gD9/19

melpomene
  • 84,125
  • 8
  • 85
  • 148
mh377
  • 1,656
  • 5
  • 22
  • 41
  • 1
    Thanks that is where I got the UK postcode regex from but it also seems to match UUIDs and I would like to ignore these if possible – mh377 Jul 26 '19 at 15:51
  • That regex is overcomplicated for no reason. – melpomene Jul 26 '19 at 16:03
  • As far as I can tell this whole mess reduces to just `GIR ?0AA|[A-Z][A-HJ-Y]?[0-9][A-Z0-9]? ?[0-9][A-Z]{2}` (assuming the `i` flag is set to make the match case insensitive). Technically the first ` ?` (optional space) is just ` ` (required space) and the second ` ?` is `\s?` (optional whitespace character) in the original regex, but I don't see why they should be treated differently. – melpomene Jul 26 '19 at 16:08
  • Its taken from here https://stackoverflow.com/questions/164979/regex-for-matching-uk-postcodes – mh377 Jul 26 '19 at 16:14
  • See [this answer](https://stackoverflow.com/a/51885364/1848654), which comes to the same result as my comment. – melpomene Jul 26 '19 at 16:23
  • It's just not worth validating UK postcodes, what possible benefit does it give you? Put a maximum length of 8 chars and let people type what they want. – DavidG Jul 26 '19 at 16:31
  • It's not for validation it's to mask log messages – mh377 Dec 18 '22 at 18:38

3 Answers3

1

Option 1

Maybe, we would just add start and end anchors and fail the UUIDs, and change the capturing groups to non, if that'd be OK:

^(?:[Gg][Ii][Rr]\s+0[Aa]{2})|(?:(?:([A-Za-z][0-9]{1,2})|(?:(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(?:(?:[A-Za-z][0-9][A-Za-z])|(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s*[0-9][A-Za-z]{2})$

The expression can be most likely simplified (e.g., non-capturing groups), I have also added extra spaces, just in case.

DEMO 1


Option 2

Another option would be to add word boundaries, then it would become almost improbable that it would match a UUID in our data, that I'm guessing, and we can also add an i flag:

(?i)(?:\bgir\b\s+\b0a{2}\b)|\b(?:[a-z][0-9]{1,2}|[a-z][a-hj-y][0-9]{1,2}|[a-z][0-9][a-z]|[a-z][a-hj-y][0-9][a-z]?)\s*[0-9][a-z]{2}\b

DEMO 2

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "^(?:[Gg][Ii][Rr]\\s+0[Aa]{2})|(?:(?:([A-Za-z][0-9]{1,2})|(?:(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(?:(?:[A-Za-z][0-9][A-Za-z])|(?:[A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\\s*[0-9][A-Za-z]{2})$";
final String string = "c25d4f64-2336-4a5d-b94c-14dc12xxxa58\n"
     + "AB11AB";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println("Full match: " + matcher.group(0));
    for (int i = 1; i <= matcher.groupCount(); i++) {
        System.out.println("Group " + i + ": " + matcher.group(i));
    }
}

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69
0

You are using the correct regex, that is issued by the UK government.

Below i added examples of how to use it:

Match full string:

When matching to a full string don't use the global flag, because then it will find the occurrences within a string, rather than testing a string to fully match the regex.

So don't use the global and multi-line flags

Notice the gm part in

/your_regex/gm

Try it in this example on regex101.com, where I have already disabled the global and multi-line flag for you.

Match in log file:

For log files, add the word identifier around your regex

Notice the \b parts in

/\byour_regex\b/gm

Try it in this example which shows this behaviour in an example log file.

Webber
  • 4,672
  • 4
  • 29
  • 38
  • That seems to work in the editor but what would the regex look like ^/my_regex/$ ?. I am actually trying to get this working using java, so I need the full regex string – mh377 Jul 26 '19 at 16:03
  • @mh377 What are you matching against? A whole file or multi-line string from which you want to extract matching lines, or are you trying to validate a single string? – melpomene Jul 26 '19 at 16:13
  • The regex is used to mask postcodes in a log file. It is also masking UUIDs which I dont want it to do – mh377 Jul 26 '19 at 16:17
  • @mh377 That's not what I asked. What is in the string you're matching against? If you don't know, at least post your code so we can check. – melpomene Jul 26 '19 at 16:18
  • @mh377 I have updated my answer to account for logfiles. – Webber Jul 27 '19 at 07:47
0

Your regex is fine, you just need to match it with the start and end of the string. Just append a ^ to the start and a $ to the end of the pattern.

^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})$

https://regex101.com/r/jwLqLx/1

Marc G. Smith
  • 876
  • 6
  • 8