How to extract the content of also nested parentheses before and after a specific character?

Question

In the following string:

(10+10)*2*((1+1)*1)√(16)+(12*12)+2

I am trying replace ((1+1)*1)√(16) with nthroot(16,(1+1)*1).
Specifically, I want to extract everything in the first sets of brackets on each side of the √.
The brackets themselves could contain many layers of brackets and many different symbols.
Language is JavaScript.

I tried a couple things like <str>.replace(/$(.+)$√$(.+)$/g, 'nthroot($1,$2)')
but every one of my attempts at learning RegEx fails and I can't figure this out.

This is not a good match for regex because paren nesting is recursive. — elclanrs, Mar 05 '22 at 02:15
What rule causes you to replace `((1+1)*1)√(16)` rather than, say, replace `2*((1+1)*1)√(16)`? We cannot reverse-engineer a single example to figure out what you are thinking. Do *not* state a question in terms of an example. It will almost never be enough. Rather, state the question in words, precisely and unambiguously, then, where appropriate, give an example for illustration. When you give an example always show the desired result. — Cary Swoveland, Mar 05 '22 at 02:52
@CarySwoveland "everything in the first sets of brackets on each side of the `√`", the `2*` is not in the brackets. — Fnwp420, Mar 05 '22 at 07:21
The first set of brackets on the left side of '√' is `(10+10)` but that clearly is not what you want. What if the string were `(10+10)+2√(16)`? Perhaps it is guaranteed that '√' is always immediately preceded by a right parenthesis and you want everything between that parenthesis and the preceding matching left parenthesis. — Cary Swoveland, Mar 05 '22 at 08:08

oriberu · Accepted Answer · 2022-03-05T11:35:39.977

I don't think you can currently solve this in a general way with a regular expression in Javascript, since you can't match balanced parentheses recursively.

Personally, I'd approach this by splitting the text into its constituent characters, building groups of parentheses, and joining all back together with some logic. For example:

let text = '(10+10)*2*((1+1)*1)√(16)+(12*12)+2';
let changedText = '';
let parts = text.split('');
let parCount = null;
let group = '';
let groups = [];

// Group the original text into nested parentheses and other characters.
for (let i = 0; i < parts.length; i++) {
    // Keep a track of parentheses nesting; if parCount is larger than 0,
    // then there are unclosed parentheses left in the current group.
    if (parts[i] == '(') parCount++;
    if (parts[i] == ')') parCount--;

    group += parts[i];

    // Add every group of balanced parens or single characters.
    if (parCount === 0 && group !== '') {
        groups.push(group);
        group = '';
    }
}

// Join groups, while replacing the root character and surrounding groups
// with the nthroot() syntax.
for (let i = 0; i < groups.length; i++) {
    let isRoot = i < groups.length - 2 && groups[i + 1] == '√';
    let hasParGroups = groups[i][0] == '(' && groups[i + 2][0] == '(';

    // If the next group is a root symbol surrounded by parenthesized groups,
    // join them using the nthroot() syntax.
    if (isRoot && hasParGroups) {
        let stripped = groups[i + 2].replace(/^\(|\)$/g, '');
        changedText += `nthroot(${stripped}, ${groups[i]})`;
        // Skip groups that belong to root.
        i = i + 2;
    } else {
        // Append non-root groups.
        changedText += groups[i]
    }
}

console.log('Before:', text, '\n', 'After:', changedText);

Not saying it's pretty, though. ;)

Peter Seliger · Answer 2 · 2022-03-05T18:34:27.873

Parsing tasks, like what the OP is asking for, can not be covered by a regular expression alone.

Especially a token's correct parsing for nested parentheses needs a simple and regex free custom parsing process. Even more, as for the OP's use case one needs to parse a correct/valid parenthesized expression each from a left and a right hand-side token (the ones that are/were separated by √).

A possible approach could be based on a single split/reduce task with the collaboration of some specialized helper functions ...

// retrieves the correct parenthesized expression
// by counting parantheses from a token's left side.
function createFirstValidParenthesizedExpression(token) {
  let expression = '';

  if (token[0] === '(') { // if (token.at(0) === '(') {
    expression = '(';

    const charList = token.split('').slice(1);
    let char;

    let idx = -1;
    let balance = 1;

    while (
      (balance !== 0) &&
      ((char = charList[++idx]) !== undefined)
    ) {
      if (char === '(') {
        balance = balance + 1;
      } else if (char === ')') {
        balance = balance - 1;
      }
      expression = expression + char;
    }
    if (balance !== 0) {
      expression = '';
    }
  }
  return expression;
}
// retrieves the correct parenthesized expression
// by counting parantheses from a token's right side.
function createFirstValidParenthesizedExpressionFromRight(token) {
  let expression = '';

  if (token.slice(-1) === ')') { // if (token.at(-1) === ')') {
    expression = ')';

    const charList = token.split('').slice(0, -1);
    let char;

    let idx = charList.length;
    let balance = 1;

    while (
      (balance !== 0) &&
      ((char = charList[--idx]) !== undefined)
    ) {
      if (char === ')') {
        balance = balance + 1;
      } else if (char === '(') {
        balance = balance - 1;
      }
      expression = char + expression;
    }
    if (balance !== 0) {
      expression = '';
    }
  }
  return expression;
}

// helper which escapes all the possible math related
// characters which are also regex control characters.
function escapeExpressionChars(expression) {
  return expression.replace(/[-+*()/]/g, '\\$&');
}

function createNthRootExpression(leftHandToken, rightHandToken) {
  leftHandToken = leftHandToken.trim();
  rightHandToken = rightHandToken.trim();

  // patterns that match partial 'nthroot' expressions
  // which are free of parentheses.
  const regXSimpleLeftHandExpression = /[\d*/]+$/;
  const regXSimpleRightHandExpression = /^[\d*/]+|^\([^+-]*\)/;

  // retrieve part of the future 'nthroot' expression
  // from the token to the left of '√'.
  const leftHandExpression =
    leftHandToken.match(regXSimpleLeftHandExpression)?.[0] ||
    createFirstValidParenthesizedExpressionFromRight(leftHandToken);

  // retrieve part of the future 'nthroot' expression
  // from the token to the right of '√'.
  const rightHandExpression =
    rightHandToken.match(regXSimpleRightHandExpression)?.[0] ||
    createFirstValidParenthesizedExpression(rightHandToken);

  leftHandToken = leftHandToken
    .replace(
      // remove the terminating match/expression from the token.
      RegExp(escapeExpressionChars(leftHandExpression) + '$'),
      '',
    );
  rightHandToken = rightHandToken
    .replace(
      // remove the starting match/expression from the token.
      RegExp('^' + escapeExpressionChars(rightHandExpression)),
      ''
    );

  return [

    leftHandToken,
    `nthroot(${ rightHandExpression },${ leftHandExpression })`,
    rightHandToken,

  ].join('');
}

const sampleExpressionOriginal =
  '(10+10)*2*((1+1)*1)√(16)+(12*12)+2';
const sampleExpressionEdgeCase =
  '(10+10)*2*((1+1)*1)√16+(12*12)+2√(4*(1+2))+3';

console.log("+++ processing the OP's expression +++")
console.log(
  'original value ...\n',
  sampleExpressionOriginal
);
console.log(
  'original value, after split ...',
  sampleExpressionOriginal
    .split('√')
);
console.log(
  'value, after "nthroot" creation ...\n',
  sampleExpressionOriginal
    .split('√')
    .reduce(createNthRootExpression)
);
console.log('\n');

console.log("+++ processing a more edge case like expression +++")
console.log(
  'original value ...\n',
  sampleExpressionEdgeCase
);
console.log(
  'original value, after split ...',
  sampleExpressionEdgeCase
    .split('√')
);
console.log(
  'value, after "nthroot" creation ...\n',
  sampleExpressionEdgeCase
    .split('√')
    .reduce(createNthRootExpression)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

How to extract the content of also nested parentheses before and after a specific character?

2 Answers2