2

I want to match any "nonchar + digits" between a SIGNAL and an END word.

(?!SIGNAL)\\W+\\d+(?=END)

BUT: the following matches +2 and ++7:

random+2END+SIGNAL+random++7END

Why is the +2 matched here? I only would want the ++7 here.

My final goal is to replace the match with blanks.

Example: https://regexr.com/4727h

Java code:

Pattern.compile(REGEX).matcher(input).replaceFirst(StringUtils.EMPTY);
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • As usual, using a capturing group. `/SIGNAL(your_pattern)END/.exec(s)[1]` – Wiktor Stribiżew Jan 24 '19 at 15:32
  • Sorry, but could you give a proper example using the `regexr` link above? Cause `SIGNAL(\W+\d+)END` won't match anything here! So I doubt this is a duplicate... – membersound Jan 24 '19 at 15:38
  • Why not? Look, [`SIGNAL.*?END`](https://regex101.com/r/bEF4wc/1) works. Please explain in a better detail what you want to get and why. – Wiktor Stribiżew Jan 24 '19 at 15:41
  • As written, I **only** want to match `++7` in this example. Not the signal word, not the end word. (as I finally want to run a regex-replace on that match of `++7` only, and keep the signal words. And I don't want to match wildcard, but only **nonchar+digits**. – membersound Jan 24 '19 at 15:42
  • Well, now, it seems you want `s.replace(/SIGNAL.*?END/g, function($0) { return $0.replace(/\W+\d+/g, ''); } )` – Wiktor Stribiżew Jan 24 '19 at 15:44
  • Well the replacement (actually I'd do this in java) should not matter for the question. The important part is: how can I match exactly only the nonchar+digits part? The replacement is then easy as added in my original question. – membersound Jan 24 '19 at 15:46
  • Correct the tags then. No, it is all important, together with the code. Add all relevant details to the question. At any rate, `(?!SIGNAL)` is redundant in your pattern and it is the same as if there were no `(?!SIGNAL)` there. What can be between `SIGNAL` and `END`? Anything? Can there be more than 1 occurrence of `non-word + digits` pattern? – Wiktor Stribiżew Jan 24 '19 at 15:49
  • Use of non-capturing groups should let you flag the SIGNAL and END and just capture what's inbetween: `/(?:SIGNAL.*?)(\W+\d+)(?:END)/` I chased SIGNAL with a lazy grab of everything until we hit `\W+\d+` – Brian Jan 24 '19 at 15:50
  • Hm ok, but still your last example includes "SIGNAL" and "END" inside the match, which is what I don't want. – membersound Jan 24 '19 at 15:51
  • It is in the match, but not in the capture group – Brian Jan 24 '19 at 15:51
  • Oh ok, I see. So if it's not possible to just create one entire match without the signal words, I could live with that. – membersound Jan 24 '19 at 15:52
  • If there may be any chars in between and the non-word+digit are followed with `END`, use `.replaceFirst("(SIGNAL.*?)\\W+\\d+(END)", "$1$2")`. It won't work if there are more of such patterns to remove in between the delimiting words. See [the regex demo](https://regex101.com/r/bEF4wc/3). – Wiktor Stribiżew Jan 24 '19 at 15:54
  • If you're working in an environment that supports look-behinds, `(?<=SIGNAL.*?)\W+\d+(?=END)` should match the original request. This works since the look-behind and look-ahead are both 0 length assertions – Brian Jan 24 '19 at 15:55
  • @Brian Actually, Java supports constrained-width lookbehind, but it is not known how many chars there can be between SIGNAL and END. `replaceFirst("(?<=SIGNAL.{0,100})\\W+\\d+(?=END)", "")` will work if there can be up to 100 any chars between SIGNAL and END. – Wiktor Stribiżew Jan 24 '19 at 15:57
  • @WiktorStribiżew Thanks, I wasn't sure what Java's support around look-behinds was – Brian Jan 24 '19 at 15:58
  • I tried both solutions but they still did not succeed... `(SIGNAL.*?)\\W+\\d+(END)` just replaces including the signal words. And using a length like `(?<=SIGNAL.{0,10})\\W+\\d+(?=END)` still replaces stuff outside the "SIGNAL...END" boundary. – membersound Jan 24 '19 at 16:02
  • No way, please share the exact string and code. See https://ideone.com/wlxLow – Wiktor Stribiżew Jan 24 '19 at 16:42
  • @membersound Please check my demo. If you precise your question I'd really be glad to help you out. Right now, it is plain unclear why all these hints do not work for you. – Wiktor Stribiżew Jan 24 '19 at 17:11
  • @WiktorStribiżew could you add your ideone example as an answer, so I could accept it. I probably had a typo in my code, because I just tested it and it works as expected. And maybe you could go into detail what `$1$2` is exactly for? tyvm! – membersound Jan 28 '19 at 09:20
  • Please give me some time. So, both the approaches work or just the first? – Wiktor Stribiżew Jan 28 '19 at 14:45
  • I like the first more because it's independent of the match size. Who knows if I'll have a longer string one day... – membersound Jan 28 '19 at 14:59

1 Answers1

1

You may use

s.replaceFirst("(SIGNAL.*?)\\W+\\d+(END)", "$1$2")

The regex matches:

  • (SIGNAL.*?) - Capturing group 1 ($1): a SIGNAL substring and then any 0+ chars other line break chars, as few as possible (as *? is a non-greedy, reluctant quantifier)
  • \W+ - 1 or more non-word chars (chars other than letters, digits and _)
  • \d+ - 1+ digits
  • (END) - Capturing group 2 ($2): an END substring
  • $1$2 - two numeric replacement backreferences that refer to Group 1 and Group 2 values

See the Java demo:

String s = "random+2END+SIGNAL+random++7END";
System.out.println(s.replaceFirst("(SIGNAL.*?)\\W+\\d+(END)", "$1$2"));
// => random+2END+SIGNAL+randomEND
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Ok now it's clear: so you replace the full match with group1+group2. – membersound Jan 28 '19 at 16:14
  • What if the start would be `&SIGNAL` and the end would be the next `&`? To match `random+2&SIGNAL+random++7&SIGNAL+random++7&` with `(&SIGNAL.*?)\W+\d+(&)`? That does not work and only finds the first match! – membersound Jan 31 '19 at 15:20
  • Found it: `s.replaceFirst("(SIGNAL.*?)\\W+\\d+(?=END)", "$1")` I'm just excluding the 2nd match with negative look, so it can be reused. – membersound Jan 31 '19 at 15:57
  • @membersound Correct, lookaheads do not consume text, only check if it is present or absent immediately to the right of the current location. – Wiktor Stribiżew Jan 31 '19 at 20:00