2

In this example...

(5) (dogs)  (5 dogs)  (dogs 5)

I would like to only match to...

(5 dogs)  -or-  (dogs 5)

The numbers could be any number of digits, contain commas, decimal points, math operators, dollar signs, etc. The only thing I need to pay attention to is that there are both numbers and alpha characters present.

I started with this modification of an example provided by hrs using this for the RegEx...

\(((letter).*(number))\)|((number).*(letter))\)

to only capture this...

(number letter)  -or-  (letter number)

but not...

(number) (letter)

by modifying the expression to be...

\(((^[a-zA-Z]).*(^[0-9]))\)|((^[0-9]).*(^[a-zA-Z]))\)

...but obviously I don't know what I'm doing.

Alan M.
  • 1,309
  • 2
  • 19
  • 29

2 Answers2

5

You can use forward lookaheads to assert that there are both a number and a letter within each set of parentheses:

\((?=[^)\d]*\d)(?=[^)a-z]*[a-z])[^)]+\)

The two lookaheads assert that there are some number of non-closing parenthesis characters and then a digit (first lookahead (?=[^)\d]*\d)) or a letter (second lookahead (?=[^)a-z]*[a-z])). The [^)]+ then matches the characters between the ( and ).

Demo on regex101

In Javascript:

const str = '(5) (dogs)  (5 dogs)  (dogs 5)'
const regex = /\((?=[^)\d]*\d)(?=[^)a-z]*[a-z])[^)]+\)/ig

console.log(str.match(regex))
Nick
  • 138,499
  • 22
  • 57
  • 95
2

As an alternative with a single lookahead:

\((?=[^)a-z]*[a-z])[^\d)]*\d[^)]*\)

Explanation

  • \( Match (
  • (?= Positive lookahead
    • [^)a-z]*[a-z] Match any char except ) or a-z, then match a-z
  • ) Close the lookahead
  • [^\d)]*\d Match any char except a digit or ) and then match a digit
  • [^)]* Match any char except )
  • \) Match )

Regex demo

const s = '(5) (dogs)  (5 dogs)  (dogs 5)';
const regex = /\((?=[^)a-z]*[a-z])[^\d)]*\d[^)]*\)/ig;

console.log(s.match(regex));
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • I was about to say that it seemed that the lookahead could just use `[^)]` rather than `[^)a-z]` when I noticed that was what you had done in the regex demo. You might want to update the answer. I guess the same applies to the `[^\d)]*`? What's interesting though is that keeping the `a-z` and `\d` in those character classes actually makes the regex more efficient (if you count steps on PCRE), I guess because it helps it discard non-matches more rapidly. – Nick Jun 09 '22 at 07:33
  • @Nick Using the `a-z` and `\d` in the negated character class gets to a match faster using [contrast](https://www.rexegg.com/regex-style.html#contrast) – The fourth bird Jun 09 '22 at 08:18
  • 1
    Thanks for the link - it makes perfect sense. I've updated the regex in my answer to also take advantage of it. – Nick Jun 09 '22 at 08:29