Javascript regex remove substrings not in larger string

Question

I have an input string containing a math expression that may contain comma-separated values that I need to remove they do not occur within an aggregation function. In those cases, I just want the first value to remain.

Consider the following example strings:

max ( 100,200,30,4 )  GOOD expression, do nothing
min ( 10,23,111 )     GOOD expression, do nothing
min ( 10,20 )         GOOD expression, do nothing
10,2,34 + 4           BAD expression, remove extra comma-number sequences => 10 + 4

So far I have tried surrounding a comma-number pattern (,\d+)+ with negative lookbehind/lookaheads:

str.replaceAll(/(?<!(max|min)\s\(\s\d+)(,\d+)+(?!\s\))/g, '');

However while this picks up the comma-number sequence outside of functions, this also incorrectly matches in valid situations as well:

max ( 100,200,30,4 )  GOOD expression
             ^^^      BAD match
min ( 10,23,111 )     GOOD expression
           ^^^        BAD match
min ( 10,20 )         GOOD expression
                      GOOD (non-match)
10,2,34 + 4           BAD expression
  ^^^^^               GOOD match

In each instance, I understand why it's matching but at a loss as to how to prevent it.

How can I do this?

The fourth bird · Accepted Answer · 2021-01-22T16:32:16.973

You could use a capture group to capture what you want to keep, and match what you want to remove.

In the replacement you could check for group 1. If it exists, return the group, else return an empty string so that what is matched is removed.

((?:max|min)\s\(\s*\d+(?:\s*,\s*\d+)*\s*\))|(?:,\d+)+

( Capture group 1
- (?:max|min)\s Match either max or min and a whitspace char
- $\s*\d+ match ( optional whitespace chars and 1+ digits
- (?:\s*,\s*\d+)*\s* Optionally repeat matching a comma between optional whitespace chars and 1+ digits, followed by optional whitespace chars
- $ Match )
) Close group 1
| Or
(?:,\d+)+ Match 1+ times a comma and 1+ digits (You could also add \s* again for optional whitespace chars before and after the comma)

Regex demo

const regex = /((?:max|min)\s\(\s*\d+(?:\s*,\s*\d+)*\s*\))|(?:,\d+)+/g;
let items = [
  "max ( 100,200,30,4 )",
  "min ( 10,23,111 )",
  "min ( 10,20 )",
  "10,2,34 + 4"
].map(s => s.replace(regex, (m, g1) => g1 !== undefined ? g1 : ""));
console.log(items)

so `str.replaceAll(/((?:max|min)\s$\s*\d+(?:\s*,\s*\d+)*\s*$)|((?:,\d+)+)/g, $1);`? — Erich, Jan 22 '21 at 16:24

score 0 · Answer 2 · answered Jan 22 '21 at 20:03

Took me a while to figure out what was going on in The fourth bird's answer. Quite a stroke of genius if you ask me.

For the sake of discussion, I will simplify the regex to the following, to find substrings that are not part of larger strings:

// all bcd's that are not in abcde
const regex = /(abcde)|(?:bcd)/g

If a match is found above (on either side of the pipe), an array is returned containing the full match at index 0, with additional indexes 1..n populated by capture groups in the expression as they occur in the expression from left to right.

By putting a capture group just on one side of the pipe, we know which side the match occurred on by whether indexes 1..n have anything in them.

If the match is made on left side of the pipe, index 1 will contain abcde since the whole side is a capture group.

If the match is made on the right side of the pipe (a non-capture group), nothing is captured and index 1 will be undefined.

We can then use a simple replaceAll(regex, '$1');, where any matches found are replaced by the contents of the first capture group. Matches found on the left side of the pipe get replaced by themselves; those on the right get replaced with nothing.

// all bcd's that are not in abcde
const regex = /(abcde)|(?:bcd)/g
console.log('abcdebcdbcdbcd'.replaceAll(regex, '$1'))

Javascript regex remove substrings not in larger string

2 Answers2