5

I am in need for a regex in Javascript. I have a string:

'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'

I want to split this string by periods such that I get an array:

[
    '*window',
    'some1',
    'some\.2',   //ignore the . because it's escaped
    '(a.b ? cc\.c : d.n [a.b, cc\.c])',  //ignore everything inside ()
    'some\.3',
    '(this.o.p ? ".mike." [ff\.])',
    'some5'
]

What regex will do this?

outis
  • 75,655
  • 22
  • 151
  • 221
user1031396
  • 103
  • 1
  • 7

5 Answers5

7
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array

Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:

/             Start of RegExp literal
(?:            Create a group without reference (example: say, group A)
   \(          `(` character
   (?:         Create a group without reference (example: say, group B)
      (['"])     ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
      \)         `)` character
      \1         The character as matched at group 1, either `'` or `"`
     |          OR
      [^)]+?     Any non-`)` character, at least once (see below)
   )+          End of group (B). Let this group occur at least once
  |           OR
   \\\.        `\.` (escaped backslash and dot, because they're special chars)
  |           OR
   [^.]+?      Any non-`.` character, at least once (see below)
)+            End of group (A). Let this group occur at least once
/g           "End of RegExp, global flag"
        /*Summary: Match everything which is not satisfying the split-by-dot
                 condition as specified by the OP*/

There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.

The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.

When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:

Index 0: <Whole match>
Index 1: <Group 1>
Rob W
  • 341,306
  • 83
  • 791
  • 678
  • I am developing a Javascript binding framework. The splitted values are property chains. The above example is something I quickly made up. The above example really means ... – user1031396 Nov 05 '11 at 20:26
  • *window (javascript window object) has a property called "some1" has a property called some\.2 evaluate the expression a.b ? cc\.c : d.n whenever a.b OR cc\.c changes and so on and so forth. Hope this answers the question and sorry about the multiple posts. Hitting the enter button does a post instead of a new line. – user1031396 Nov 05 '11 at 20:29
  • This is not correct. (a.b ? cc\.c : d.n [a.b, cc\.c]) you split this result, while you shouldn't. – FailedDev Nov 05 '11 at 20:37
  • @FailedDev Updated, it now correctly deals with quoted parentheses. – Rob W Nov 05 '11 at 20:42
  • Rob W,
    Seems like this might be the solution.
    Slight change to the string, adding ')'
    *window.some1.some\.2.(a.b + ")" + ')' ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5
    – user1031396 Nov 05 '11 at 20:53
  • @user1031396 HTML doesn't work in comments. Only very basic mark-down (links and code using a backtick) are enabled. – Rob W Nov 05 '11 at 20:55
  • @RobW Thks. Rewriting the above comment. Slight change to the string, adding ')' in the string to be splitted. *window.some1.some\.2.(a.b + ")" + ')' ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5 – user1031396 Nov 06 '11 at 19:56
  • What do you mean? The current expression returns the results as requested in the question. If you want additional features, create a new question, linking to this one, because the current answers are aimed at your current question. – Rob W Nov 06 '11 at 20:17
  • @Rob W, can you please post an explanation of your regex. It will help me maintain it in the future. – user1031396 Nov 10 '11 at 00:16
  • @Rob W, very much appreciated. – user1031396 Nov 15 '11 at 18:57
3

The regex below :

result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);

Can be used to acquire the desired results. Group 1 has the results since you want to omit the .

Use this :

var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
    for (var i = 0; i < match.length; i++) {
        // matched text: match[i]
    }
    match = myregexp.exec(subject);
}

Explanation :

// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
// 
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
//    Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
//          Match the character “(” literally «\(»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match a single character NOT present in the list “'"” «[^'"]»
//          Match the character “)” literally «\)»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match any character that is NOT a “A \ character” «[^\\]»
//    Match the regular expression below «(?:\.|$)»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
//          Match the character “.” literally «\.»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
//          Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
FailedDev
  • 26,680
  • 9
  • 53
  • 73
2

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.

You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:

  • Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
  • Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
  • Add the matching text to a buffer.
  • If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.

This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!

Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:

  • Uses a Regex pattern to find the splits
  • Only splits if there are balanced parenthesis
  • Only splits if there are balanced quotes
  • Allows escaping of parenthesis, quotes, and splits using \

This code will work perfectly for your example.

Scott Rippey
  • 15,614
  • 5
  • 70
  • 85
0

not need regex for this work.

var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';

console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));

output:

  ["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]
Kakashi
  • 2,165
  • 14
  • 19
  • 1
    That doesn't appear to meet the requirements of the question (ignoring `\.`, and ignoring `.` inside parentheses...) – Greg Hewgill Nov 05 '11 at 20:11
  • Yet this isn't what the OP wanted. The OP wanted that text inside of the `()` will remain as one unit (even though there are dots inside of it), and an escaped dot (`/.`) should be ignored as well. – Madara's Ghost Nov 05 '11 at 20:13
  • This is the first time I have posted a question on stackflow and am amazed at the quick responses. Thanks stacksflow and thanks to all who responded. – user1031396 Nov 05 '11 at 20:23
0

So, was working with this, and now I see that @FailedDev is rather not a failure, since that was pretty nice. :)

Anyhow, here's my solution. I'll just post the regex only.

((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)

Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Gaute Løken
  • 7,522
  • 3
  • 20
  • 38
  • As you have mentioned, you're using a look-behind, which are not supported in JavaScript. Even if look-behinds were supported, `?!` has to be `?=` (look-ahead). – Rob W Nov 05 '11 at 21:13
  • No, I wanted negative lookbehind, not lookahead. I wanted to match the ) character that's not preceded by a " character => Negative lookbehind. – Gaute Løken Nov 05 '11 at 21:15
  • I am referring to the `?!` (at `\)(?!`). You want to match a parenthesis which is preceded and postfixed by a double-quote character. – Rob W Nov 05 '11 at 21:18
  • In that case, no I want to end my match at the first ) which is not enclosed in double quotes. So I want to match a parenthesis which is not preceded nor postfixed by double-quote characters. So my reasoning was sound. I do see a bug in it however, but I'm not going to point it out for you since you're poking at me. :) – Gaute Løken Nov 05 '11 at 22:56