2

I am looking to match a postcode using the following regex:

(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})

I am trying to parse an address from an HTML document and as such I only want to match nodes that start with a postcode or contain a postcode that is preceded by a space or comma. Otherwise there are too many false positives e.g. matching colours (preceded by #).

I need to amend the regex to either find the postcode with no preceding characters or a space or comma immediately preceding it and any number of characters before this. How can I do this?

For example, I would want to match:

IP14 2PL
1 The street, ipswich, IP14 2PL
1 The street, ipswich,IP14 2PL

BUT NOT

https://t.co/ip142plzruc
Macros
  • 7,099
  • 2
  • 39
  • 61
  • 2
    Could you provide input and desire output ? – Thomas Ayoub Feb 02 '16 at 12:54
  • 1
    "I am trying to parse an address from an HTML document". Note the OP isn't regexing the HTML (I [hope](http://stackoverflow.com/a/1732454/1901857)), just the extracted content, so don't downvote based purely on that statement :) – Rhumborl Feb 02 '16 at 12:57
  • I'm not regexing the HTML, I am parsing the HTML and matching the innertext of certain nodes to a regex. Why the downvote? – Macros Feb 02 '16 at 13:00
  • 1
    @Rhumborl don't assume that the guy who comment is the guy who downvote ;) – Thomas Ayoub Feb 02 '16 at 13:15
  • 1
    @Thomas sorry, don't get you. I posted that comment before any voting had occurred, to stop a reasonable question being unnecessarily massacred. When I first read the question, I instinctively thought, yeah here's another one - then I actually read the regex and realised it's not. – Rhumborl Feb 02 '16 at 13:21

1 Answers1

11

Just add this in front of your expression:

(?:^|[, ])

It will make mandatory to have a space or a comma before or it begins a line.

Toto
  • 89,455
  • 62
  • 89
  • 125