4

I'm using the following regex to match all words:

mystr.replace(/([^\W_]+[^\s-]*) */g, function (match, p1, index, title) {...}

Note that words can contain special characters like German Umlauts. How can I match all words excluding those inside parentheses?

If I have the following string:

here wäre c'è (don't match this one) match this

I would like to get the following output:

here
wäre
c'è
match
this

The trailing spaces don't really matter. Is there an easy way to achieve this with regex in javascript?

EDIT: I cannot remove the text in parentheses, as the final string "mystr" should also contain this text, whereas string operations will be performed on text that matches. The final string contained in "mystr" could look like this:

Here Wäre C'è (don't match this one) Match This
thomasf
  • 53
  • 1
  • 5
  • 1
    I don't think that it is possible using single regex, probably you'll need to cut parentheses with their content first. – Sergey Rybalkin Oct 15 '12 at 11:44
  • Do you need to account for nested (like this (or even this)) parentheses? If so, you will have to impose an upper bound on the nesting or go to a non-RE-based solution. – Vatine Oct 15 '12 at 16:21
  • No need to account for nested parentheses. There can be several parenteses, but they will not be nested. e.g. "(like this) and like (this)" – thomasf Oct 16 '12 at 08:08
  • I accept Fabrizio's answer as it was correct before making my question more specific. To solve my problem I will search for the opening and closing parens inside the callback function. That's not as nice as a regex but it works well. – thomasf Oct 16 '12 at 12:38

2 Answers2

4

Try this:

var str = "here wäre c'è (don't match this one) match this";

str.replace(/\([^\)]*\)/g, '')  // remove text inside parens (& parens)
   .match(/(\S+)/g);            // match remaining text

// ["here", "wäre", "c'è", "match", "this"]
Fabrizio Calderan
  • 120,726
  • 26
  • 164
  • 177
  • BTW, parens have no meaning in a character class, hence they don't need to be escaped - `[^)]` is perfectly fine. The same goes for any other meta character. – Tomalak Oct 15 '12 at 11:54
  • yes, indeed. I just always escape special characters as personal habit, even if not necessary. – Fabrizio Calderan Oct 15 '12 at 11:57
  • Thanks Fabrizio, but I was not specific enough in my question. I cannot remove the string in parentheses as the whole string including text in parentheses should be returned while there will be string operations performed on the matches. – thomasf Oct 15 '12 at 16:00
  • I would have problems combining the modified copy with the original one as the final output contains the modified matches and the ignored text in parentheses. – thomasf Oct 16 '12 at 08:05
2

Thomas, resurrecting this question because it had a simple solution that wasn't mentioned and that doesn't require replacing then matching (one step instead of two steps). (Found your question while doing some research for a general question about how to exclude patterns in regex.)

Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):

\(.*?\)|([^\W_]+[^\s-]*)

The left side of the alternation matches complete (parenthesized phrases). We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.

This program shows how to use the regex (see the matches in the online demo):

<script>
var subject = 'here wäre c\'è (don\'t match this one) match this';
var regex = /\(.*?\)|([^\W_]+[^\s-]*)/g;
var group1Caps = [];
var match = regex.exec(subject);

// put Group 1 captures in an array
while (match != null) {
    if( match[1] != null ) group1Caps.push(match[1]);
    match = regex.exec(subject);
}

document.write("<br>*** Matches ***<br>");
if (group1Caps.length > 0) {
   for (key in group1Caps) document.write(group1Caps[key],"<br>");
   }

</script>

Reference

How to match (or replace) a pattern except in situations s1, s2, s3...

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
  • please could you help me with this http://stackoverflow.com/questions/23797093/regex-email-validation-that-allows-only-hyphens-in-the-middle-of-the-domain-and – Axel May 22 '14 at 05:26