1

We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.

ex we would like to see the following string split into the following components

1|2|3\|4|5

1
2
3\|4
5

I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.

elclanrs
  • 92,861
  • 21
  • 134
  • 171
MedicineMan
  • 15,008
  • 32
  • 101
  • 146
  • The MDN docs suggest javascript REs support lookahead assertions but not lookbehind assertions. Anyone know if this is accurate? Because a lookbehind assertion is the trivial solution here. – Lily Ballard Oct 05 '12 at 21:34
  • @KevinBallard unfortunately MDN is right. See e.g. http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript – Kijewski Oct 06 '12 at 02:52

3 Answers3

9

Instead of a split, do a global match (the same way a lexical analyzer would):

  • match anything other than \\ or |
  • or match any escaped char

Something like this:

var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);

A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • I guess that is a good way to do it. The other thing you can do is to find the matches for the pattern `[^\\]([|])` and save everything that is not group(1). The advantage is that the regular expression is a bit easier on the eyes. – Maarten Bodewes Oct 06 '12 at 00:44
  • could you explain the expression that you've provided above? I'm quite new to regex and related technologies. – MedicineMan Oct 08 '12 at 19:52
1

What you're looking for is a "negative look-behind matching regular expression".

This isn't pretty, but it should split the list for you:

var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});

This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.

Andrew Coonce
  • 1,557
  • 11
  • 19
  • Even shorter: `return $1||$0+'\n'`. But does this solution work? – elclanrs Oct 05 '12 at 21:46
  • @elclanrs `$1` will be `""` if the pipe was not preceded by a backslash, otherwise it will contain that very character. An empty string evaluates to false. – Kijewski Oct 06 '12 at 02:50
0

A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).

Without using Regex, if I understood what you desire, this should do the job:

function doSplit(input) {
    var output = [];
    var currPos = 0,
        prevPos = -1;
    while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
        if (input[currPos-1] == "\\") continue;
        var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
        prevPos = currPos;
        output.push(recollect);
    }
    var recollect = input.substr(prevPos + 1);
    output.push(recollect);
    return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]
Mamsaac
  • 6,173
  • 3
  • 21
  • 30