0

I've got a string that may or may not contain numbers. If there is a number, it will be standalone like '3200 Fedex FL' or '10 Downing St' or as part of a name like '4th ST NW' or 'I96' or 'US28'. I'm looking for a regex that will ignore the standalone numbers and give me the rest of the string but will keep the numbers as part of the name

Tried

Function getAddress(addr As String)  
Dim allMatches As Object  
Dim RE As Object  
Set RE = CreateObject("vbscript.regexp")  
RE.Pattern = "(\b[\d]+\b)"   
RE.Global = True  
RE.IgnoreCase = True 

Set allMatches = RE.Execute(addr)

If (allMatches.Count <> 0) Then
    result = allMatches.Item(0).submatches.Item(0)
End If
getAddress = result
End Function

Example Dataset

0 I64 EB MM 93
1519 KINGSCROSS RD
28 VA288 298
JOHN RANDOLPH RD
4700 WALMSLEY BL
BOWLING GREEN RD / BOB WHITE RD
BRUCE CT /FLORIDA AV
BUCK RD AND WHITEHALL RD
DOWNTOWN EWRESSWY EB 2ND ST
HYHLAND VISTA / GEORGE WASHINGTON BL
HYHLAND VISTA / GEORGE WASHINGTON BL
HYHLAND VISTA DR / GEORGE WASHINGTON BL
I95 25 43
LAUARL RIDGE MILL RD /CLARENCE RD
LAUARL RIDGE MILL RD /CLARENCE RD
NOVAH HOWARD ST /SEMINARY RD
OLD COUVAHOUSE RD R COUVAHOUSE RD
OLD COUVAHOUSE RD R COUVAHOUSE RD
WOODLAND AND ROANOKE
1501 SAMS CR1
15281 WHITEHEAD RD
1532 MARLBORO ST
16907 BRANDERS BRIDGE RD
1750 WILLIAM ST

Expected Output:
I64 EB MM
KINGSCROSS RD VA288 JOHN RANDOLPH RD
BOWLING GREEN RD / BOB WHITE RD
BRUCE CT /FLORIDA AV
BUCK RD AND WHITEHALL RD
DOWNTOWN EWRESSWY EB 2ND ST
HYHLAND VISTA / GEORGE WASHINGTON BL
HYHLAND VISTA / GEORGE WASHINGTON BL
HYHLAND VISTA DR / GEORGE WASHINGTON BL
I95
LAUARL RIDGE MILL RD /CLARENCE RD
LAUARL RIDGE MILL RD /CLARENCE RD
NOVAH HOWARD ST /SEMINARY RD
OLD COUVAHOUSE RD R COUVAHOUSE RD
OLD COUVAHOUSE RD R COUVAHOUSE RD
WOODLAND AND ROANOKE
SAMS CR1
WHITEHEAD RD
MARLBORO ST
BRANDERS BRIDGE RD
WILLIAM ST

Community
  • 1
  • 1
Cyclops
  • 17
  • 6

5 Answers5

0

Try the following regex:

(?![0-9]+\s).

It looks for consecutive numbers followed by a space and negates them

gaganshera
  • 2,629
  • 1
  • 14
  • 21
0

It seems to be easier match standalone numbers and replace them with empty string. The standalone numbers regex (with surrounding spaces) is

\s*\b\d+\b\s*

Demo: https://regex101.com/r/JuEAQd/1

Note: the output result in the demo above is a little bet mixed up because all test strings put in one input but when taken one by one, the result is fine.

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
0

I believe this is the pattern you want (match letters followed by number + rest of word or match numbers followed by letter followed by rest of word)

/\b([A-z]+[0-9]+[A-z0-9]*|[0-9]+[A-z]+[A-z0-9]*)\b/gm

and in action...

var pattern = /\b([A-z]+[0-9]+[A-z0-9]*|[0-9]+[A-z]+[A-z0-9]*)\b/gm
var str = "0 I64 EB MM 93\
1519 KINGSCROSS RD\
28 VA288 298\
JOHN RANDOLPH RD\
4700 WALMSLEY BL\
BOWLING GREEN RD / BOB WHITE RD\
BRUCE CT /FLORIDA AV\
BUCK RD AND WHITEHALL RD\
DOWNTOWN EWRESSWY EB 2ND ST\
HYHLAND VISTA / GEORGE WASHINGTON BL\
HYHLAND VISTA / GEORGE WASHINGTON BL\
HYHLAND VISTA DR / GEORGE WASHINGTON BL\
I95 25 43\
LAUARL RIDGE MILL RD /CLARENCE RD\
LAUARL RIDGE MILL RD /CLARENCE RD\
NOVAH HOWARD ST /SEMINARY RD\
OLD COUVAHOUSE RD R COUVAHOUSE RD\
OLD COUVAHOUSE RD R COUVAHOUSE RD\
WOODLAND AND ROANOKE\
1501 SAMS CR1\
15281 WHITEHEAD RD\
1532 MARLBORO ST\
16907 BRANDERS BRIDGE RD\
1750 WILLIAM ST"
console.log(str.match(pattern))
Tezra
  • 8,463
  • 3
  • 31
  • 68
0

You didn't specify the programming language you're using, but you need to replace not simply match, something like:

for python:

import re
new_text = re.sub(r" ?\b\d+\b ?", "", old_text)

for php:

$new_text = preg_replace('/ ?\b\d+\b ?/', '', $old_text);

for vb (.NET):

Dim ResultString As String
Try
    Dim RegexObj As New Regex(" ?\b\d+\b ?")
    ResultString = RegexObj.Replace(SubjectString, "")
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Live Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

Use the Replace method of the RegExp object:

RE.Global = True   
RE.Pattern = "\b\d+(\s|$)"
result = RE.Replace(addr, "") ' Remove all matches from string
trincot
  • 317,000
  • 35
  • 244
  • 286