How do I use RegEx to find parens that contain BOTH numbers and letters, not just one or the other

Question

In this example...

(5) (dogs)  (5 dogs)  (dogs 5)

I would like to only match to...

(5 dogs)  -or-  (dogs 5)

The numbers could be any number of digits, contain commas, decimal points, math operators, dollar signs, etc. The only thing I need to pay attention to is that there are both numbers and alpha characters present.

I started with this modification of an example provided by hrs using this for the RegEx...

\(((letter).*(number))\)|((number).*(letter))\)

to only capture this...

(number letter)  -or-  (letter number)

but not...

(number) (letter)

by modifying the expression to be...

\(((^[a-zA-Z]).*(^[0-9]))\)|((^[0-9]).*(^[a-zA-Z]))\)

...but obviously I don't know what I'm doing.

Nick · Answer 1 · 2022-06-09T08:26:39.167

You can use forward lookaheads to assert that there are both a number and a letter within each set of parentheses:

\((?=[^)\d]*\d)(?=[^)a-z]*[a-z])[^)]+\)

The two lookaheads assert that there are some number of non-closing parenthesis characters and then a digit (first lookahead (?=[^)\d]*\d)) or a letter (second lookahead (?=[^)a-z]*[a-z])). The [^)]+ then matches the characters between the ( and ).

Demo on regex101

In Javascript:

const str = '(5) (dogs)  (5 dogs)  (dogs 5)'
const regex = /\((?=[^)\d]*\d)(?=[^)a-z]*[a-z])[^)]+\)/ig

console.log(str.match(regex))

The fourth bird · Accepted Answer · 2022-06-09T08:16:26.603

2

As an alternative with a single lookahead:

\((?=[^)a-z]*[a-z])[^\d)]*\d[^)]*\)

Explanation

\( Match (
(?= Positive lookahead
- [^)a-z]*[a-z] Match any char except ) or a-z, then match a-z
) Close the lookahead
[^\d)]*\d Match any char except a digit or ) and then match a digit
[^)]* Match any char except )
\) Match )

Regex demo

const s = '(5) (dogs)  (5 dogs)  (dogs 5)';
const regex = /\((?=[^)a-z]*[a-z])[^\d)]*\d[^)]*\)/ig;

console.log(s.match(regex));

edited Jun 09 '22 at 08:16

answered Jun 09 '22 at 05:46

The fourth bird

154,723
16
55
70

I was about to say that it seemed that the lookahead could just use `[^)]` rather than `[^)a-z]` when I noticed that was what you had done in the regex demo. You might want to update the answer. I guess the same applies to the `[^\d)]*`? What's interesting though is that keeping the `a-z` and `\d` in those character classes actually makes the regex more efficient (if you count steps on PCRE), I guess because it helps it discard non-matches more rapidly. – Nick Jun 09 '22 at 07:33
@Nick Using the `a-z` and `\d` in the negated character class gets to a match faster using [contrast](https://www.rexegg.com/regex-style.html#contrast) – The fourth bird Jun 09 '22 at 08:18
1

Thanks for the link - it makes perfect sense. I've updated the regex in my answer to also take advantage of it. – Nick Jun 09 '22 at 08:29

How do I use RegEx to find parens that contain BOTH numbers and letters, not just one or the other

2 Answers2