0

I have a collection of all uppercase address names and numbers and I want to extract just the first encountered address number for each address. The following examples show what I would like to extract from each:

  • 80 ROSE COTTAGE -> 80
  • 80A ROSE COTTAGE -> 80A
  • 80 A ROSE COTTAGE -> 80 A
  • 80ROSE COTTAGE -> 80 (accidental no-space)
  • [ANY OTHER TEXT] 80 ROSE COTTAGE -> 80

I have found some similar questions answered here and elsewhere on the internet, but they always deal with an address as a whole as opposed to specifically just address name and number:

Match each address from the address number to the 'street type'

regex street address match

Regular Expression: Any character that is NOT a letter or number

javascript regular expressions address number

JavaScript regex to validate an address

The last one makes reference to a lookahead, which lead me to construct a negative look ahead for any alphanumeric characters following a potential single text character(eg. 80 A) in my JavaScript regex. However without adding the alternative "digits only found" group (\d+) my fourth example above does not return just the number.

(?:\d+\s*[A-Z]?(?![A-Z0-9]))|(?:\d+))

Is there a way to combine these two groups into a single regex expression? Or is this not possible in JavaScript's regex implementation?

Any help with this would be greatly appreciared.

Community
  • 1
  • 1
Derek
  • 13
  • 3
  • 1
    Does it really have to be that complicated? An address usually has only one number which must be the number you are looking for. If it is followed by a character directly like in `80A` or if it is followed by a character encased in spaces like in `80 A ` then that is what you are looking for. – Ke Vin Sep 22 '14 at 11:04
  • /hi thanks for your reply. The dataset is not perfect and as with my last two examples sometimes the number is not at the start, or a word following the number without a seperating space. Without using the look ahead, i found that 80ROSECOTTAGE would result in 80R when it should just be 80. Thus I have currently added the digit only alternative group. This works, but I am wondering if there is a way to combine without having the groups. – Derek Sep 22 '14 at 11:23

1 Answers1

0
(\d+\s*(?:[A-Z](?![A-Z]))?)

You can try this.

See demo.

http://regex101.com/r/kM7rT8/13

vks
  • 67,027
  • 10
  • 91
  • 124
  • Hi, I am attempting to apply your suggestion to the first group in my regular expression (so that I can drop the second). The best I seem to do is: \d+\s*[A-Z]?(?![A-Z]{2,}) This works fine apart from it seems to drop the zero from 80 when the example text is '80ROSE COTTAGE'. Did you mean to apply this somehow else? What I require is: [At least one digit(s)] followed by [Any or no whitespace] followed by [Any one 'A to Z' that is not followed by another 'A to Z'] (or failing the last part, just the original digit(s)). I hope that makes sense? – Derek Sep 22 '14 at 14:50
  • @Derek you have to use .replace function.And replace with ``.and just use the regex given and nothing else. – vks Sep 22 '14 at 14:55
  • Hi sorry I didnt realise that was the case. Am I right in thinking that carrying out a replace using this on 'A 80 A' would return 'A 80 A' as opposed to '80 A'? – Derek Sep 22 '14 at 15:22
  • @Derek it would return `A 80 A` – vks Sep 22 '14 at 15:22
  • @Derek http://regex101.com/r/kM7rT8/12 – vks Sep 22 '14 at 15:24
  • Thanks for that, but I don't need a regular expression to replace all multiple 'A to Z' occurances with "", because as is the case with 'A 80 A' or '80 A N A D D R E S S W I T H T O O M A N Y S P A C E S', this would return false positives. I am trying to specifically pattern match: [At least one digit(s)] followed by [Any or no whitespace] followed by [Any one 'A to Z' that is not followed by another 'A to Z'] (or failing the last part, just the original digit(s)). I have managed to do this with the two groups, but I am wondering if it can be made into a single expression with no groups. – Derek Sep 22 '14 at 15:47
  • @Derek try now the new regex – vks Sep 22 '14 at 15:52
  • vks thank you very much, that is exactly what I was after... I hadn't considered grouping the character after digits check together as you have done in that last epxression. That is very helpful, and very much appreciated :-) – Derek Sep 22 '14 at 16:08
  • @Derek glad we could do it finally. :) – vks Sep 22 '14 at 16:09