-1

I want to match every cases of "-", but not these ones:

  1. [\d]-[A-Z]
  2. [A-Z]-[\d]

I tried this pattern: ((?<![A-Z])-(?![0-9]))|((?<![0-9])-(?![A-Z])) but some results are incorrect like: "RUA VF-32 N"

Can anyone help me?

melpomene
  • 84,125
  • 8
  • 85
  • 148
Danilo
  • 21
  • 3

1 Answers1

0

A simple approach is to use grep with your current logic and inverting the result, and then run another grep to only keep those items that have a hyphen in them:

x <- c("QUADRA 120 - ASA BRANCA","FAZENDA LAGE -RODOVIA RIO VERDE","C-15","99-B","A-A")
grep("-", grep("[A-Z]-\\d|\\d-[A-Z]", x, invert=TRUE, value=TRUE), value=TRUE, fixed=TRUE)
# => [1] "QUADRA 120 - ASA BRANCA"         "FAZENDA LAGE -RODOVIA RIO VERDE"
#    [3] "A-A"   

Here, [A-Z]-\\d|\\d-[A-Z] matches a hyphen either in between an uppercase ASCII etter or a digit or betweena digit and an ASCII uppercase letter. If there is a match, the result is inverted due to invert=TRUE.

See the R demo.

To only match - in all contexts other than in between a letter and a digit, you may use the PCRE regex based on SKIP-FAIL technique like

> grep("(?:\\d-[A-Z]|[A-Z]-\\d)(*SKIP)(*F)|-", x, perl=TRUE)
[1] 1 2

See this regex demo

Details

  • (?:\d-[A-Z]|[A-Z]-\d) - a non-capturing group that matches either a digit, - and then uppercase ASCII letter, or an uppercase ASCII letter, - and a digit
  • (*SKIP)(*F) - omit the current match and proceed looking for the next match at the end of the "failed" match
  • | - or
  • - - a hyphen.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563