-1

input

books.copies.[read_by.[p_id="65784"].page=5468].text.[paragraph="20"].letters

the idea is to split the string by dots but ignore those inside square brackets

so after splitting there should be an array

[
  'books',
  'copies',
  '[read_by.[p_id="65784"].page=5468]',
  'text',
  '[paragraph="20"]',
  'letters'
]

I already looked at this answer but it doesn't work with nested square brackets, which is what i need. Also I'm using javascript, so negative lookbehinds are not supported.

Help is much appreciated.

Edit 1: expand example

obedm503
  • 128
  • 1
  • 9

2 Answers2

1

It isn't possible to do it with a regex in Javascript that isn't able to match nested structures. You need to use the good old method: a stack.

var text = 'books.copies.[read_by.[p_id="65784"].page=5468].text.[paragraph="20"].letters';

var item = '', result = [], stack = 0;

for (var i=0; i < text.length; i++) {
    if ( text[i] == '.' && stack == 0 ) {
        result.push(item);
        item = '';
        continue;
    } else if ( text[i] == '[' ) {
        stack++;
    } else if ( text[i] == ']' ) {
        stack--;
    }
    item += text[i];
}

result.push(item);

console.log(result);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

You need to write a parser for this since a JavaScript regex does not support regex recursion, nor balanced constructs.

The point in these functions is that they keep a stack (level, openBrackets) of opening delimiters (in your case, it is [) and then check the stack state: if the stack is not emppty, the found . is considered inside the brackets, and is thus just appended to the current match. Else, when the stack is empty, the . found is considered outside of brackets, and is thus used to split on (the current value is appended to the output array (result, ret)).

function splitByDotsOutsideBrackets(string){
    var openBrackets = 0, ret = [], i = 0;
    while (i < string.length){
        if (string.charAt(i) == '[')
            openBrackets++;
        else if (string.charAt(i) == ']')
            openBrackets--;
        else if (string.charAt(i) == "." && openBrackets == 0){
            ret.push(string.substr(0, i));
            string = string.substr(i + 1);
            i = -1;
        }
        i++;
    }

    if (string != "") ret.push(string);
    return ret;
}
var res = splitByDotsOutsideBrackets('books.copies.[read_by.[p_id="65784"].page=5468].text.[paragraph="20"].letters');
console.log(res);

Or another variation:

function splitOnDotsOutsideNestedBrackets(str) {
    var result = [], start = 0, level = 0;
    for (var i = 0; i < str.length; ++i) {
        switch (str[i]) {
            case '[':
                ++level;
                break;
 
            case ']':
                if (level > 0)    
                    --level;
                break;
 
            case '.':
                if (level)
                    break;
                if (start < i)
                    result.push(str.substr(start, i - start));
                start = i + 1;
                break;
        }
    }
 
    if (start < i)
        result.push(str.substr(start, i - start));
   
    return result;
}

var s = 'books.copies.[read_by.[p_id="65784"].page=5468].text.[paragraph="20"].letters';
console.log(splitOnDotsOutsideNestedBrackets(s))

Adapted from one of my previous answers.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563