2

Given a string like the following in JavaScript

var a = 'hello world\n\nbye world\n\nfoo\nbar\n\nfoo\nbaz\n\n';

I want to split it into an array like this

['hello world', '\n\n', 'bye world', '\n\n', 'foo\nbar', '\n\n', 'foo\nbaz', '\n\n'].

If the input is var a = 'hello world\n\nbye world', the result should be ['hello world', '\n\n', 'bye world'].

In other words, I want to split the string around '\n\n' into an array such that the array contains the '\n\n' as well. Is there any neat way to do this in JavaScript?

Lone Learner
  • 18,088
  • 20
  • 102
  • 200
  • What do you want `'a\n\n\n\nb'` to split into? – Cameron Mar 24 '12 at 19:24
  • @AdamZalcman I have tried running a loop over the string and using `indexOf` and `substr` functions to extract what I want but I am wondering if there is a neater way to do solve this. – Lone Learner Mar 24 '12 at 19:27
  • @Cameron Either of the two is okay for me: ['a', '\n\n', 'b'] and ['a', '\n\n', '\n\n', 'b']. The output for this is immaterial for my particular case, so we can choose whichever leads to a more elegant code. – Lone Learner Mar 24 '12 at 19:29
  • For a similar problem where the separator is to be retained within the split elements, see: http://stackoverflow.com/questions/12001953/javascript-and-regex-split-string-and-keep-the-separator – Jens Jensen Apr 19 '13 at 19:12

1 Answers1

3

Here’s a one liner:

str.match(/\n\n|(?:[^\n]|\n(?!\n))+/g)

Here’s how it works:

  • \n\n matches the two consecutive newline characters
  • (?:[^\n]|\n(?!\n))+ matches any sequence of one or more character of either
    • [^\n] not a newline character, or
    • \n(?!\n) a newline character but only if not followed by another newline character

This recursive pattern can be applied on any length:

// useful function to quote strings for literal match in regular expressions
RegExp.quote = RegExp.quote || function(str) {
    return (str+"").replace(/(?=[.?*+^$[\]\\(){}|-])/g, "\\");
};
// helper function to build the above pattern recursively
function buildRecursivePattern(chars, i) {
    var c = RegExp.quote(chars[i]);
    if (i < chars.length-1) return "(?:[^" + c + "]|" + c + buildRecursivePattern(chars, i+1) + ")";
    else return "(?!" + c + ")";
}
function buildPattern(str) {
    return RegExp(RegExp.quote(delimiter) + "|" + buildRecursivePattern(delimiter.match(/[^]/g), 0) + "+", "g");
}

var str = 'hello world\n\nbye world\n\nfoo\nbar\n\nfoo\nbaz\n\n',
    delimiter = "\n\n",
    parts;
parts = str.match(buildPattern(delimiter))

Update    Here’s a modification for String.prototype.split that should add the feature of containing a matched separator as well:

if ("a".split(/(a)/).length !== 3) {
    (function() {
        var _f = String.prototype.split;
        String.prototype.split = function(separator, limit) {
            if (separator instanceof RegExp) {
                var re = new RegExp(re.source, "g"+(re.ignoreCase?"i":"")+(re.multiline?"m":"")),
                    match, result = [], counter = 0, lastIndex = 0;
                while ((match = re.exec(this)) !== null) {
                    result.push(this.substr(lastIndex, match.index-lastIndex));
                    if (match.length > 1) result.push(match[1]);
                    lastIndex = match.index + match[0].length;
                    if (++counter === limit) break;
                }
                result.push(this.substr(lastIndex));
                return result;
            } else {
                return _f.apply(arguments);
            }
        }
    })();
}
Gumbo
  • 643,351
  • 109
  • 780
  • 844