The task: I need to find all abbreviations of address object identifiers in a string using a list of said abbreviations. (To delete them later). (Abbreviations list is in another language and is waaaay bigger (200+ elements), so foreach is out of question due to "complex regex beats foreach in speed").
The problem:
Regex like this (?:[^\w\d]|\A)(?:street|str|c|city|state|st|apt)([^\w\d]|\Z)
works on a string like this: Klutc state, Beast st, apt c5
and correcttly gives state, st, apt
.
But on a string: state Klutc, Beast st,apt c5
it returns state
and st
, but not apt
, because the [^\w\d]
is somewhat stolen by the previous st
I also cannot use just the (?:[^\w\d]|\A)(?:street|str|c|city|state|st|apt)
(left side) because it will not work on Klutc state, Beast st, apt c5
and give c
from c5
Neither can I use only the right side (?:street|str|c|city|state|st|apt)([^\w\d]|\Z)
because on a string Klutc state, Beast st, apt c5
it will return st
from beast
and c
from Klutc
.
The question:
How should I rewrite the regex, so it correctly return the abbreviations only? (Make st,
not steal ,
from ,apt
, i.e. make st
and apt
both use the same ,
). Test inputs are:
Klutc state, Beast st, apt c5
state Klutc, Beast st,apt c5
Klutc State,Beast st,c5 apt