0

If given an comma separated string as follows

'UserName,Email,[a,b,c]'

i want a split array of all the outermost elements so expected result

['UserName','Email', '[a,b,c]']

string.split(',') will split across every comma but that wont work so any suggestions? this is breaking a CSV reader i have.

  • split it at commas and detect the cases where to reintegrate. option b: write your own (stateful) split method. – Psi Aug 17 '22 at 17:27

2 Answers2

0

I wrote 2 similar answers, so might as well make it a 3rd instead of referring you there. It's a stateful split. This doesn't support nested arrays, but can easily made so.

var str = 'UserName,Email,[a,b,c]'

function tokenize(str) {
  var state = "normal";
  var tokens = [];
  var current = "";
  for (var i = 0; i < str.length; i++) {
    c = str[i];

    if (state == "normal") {
      if (c == ',') {
        if (current) {
          tokens.push(current);
          current = "";
        }
        continue;
      }
      if (c == '[') {
        state = "quotes";
        current = "";
        continue;
      }
      current += c;
    }
    if (state == "quotes") {
      if (c == ']') {
        state = "normal";
        tokens.push(current);
        current = "";
        continue;
      }
      current += c;
    }
  }
  if (current) {
    tokens.push(current);
    current = "";
  }
  return tokens;
}


console.log(tokenize(str))
IT goldman
  • 14,885
  • 2
  • 14
  • 28
0

You can do this by matching the string to this Regex:

 /(^|(?<=,))\[[^[]+\]|[^,]+((?=,)|$)/

let string = '[a,b,c],UserName,[1,2],Email,[a,b,c],password'
let regex = /(^|(?<=,))\[[^[]+\]|[^,]+((?=,)|$)/g
let output = string.match(regex);
console.log(output)
The regex can be summarized as:

Match either an array or a string that's enclosed by commas or at the start/end of our input

The key token we're using is alternative | which works as a sort of either this, or that and since the regex engine is eager, when it matches one, it moves on. So if we match and array, then we move on and don't consider what's inside.

We can break it down to 3 main sections:

  • (^|(?<=,))

    • ^ Match from the beginning of our string
    • | Alternatively
    • (?<=,) Match a string that's preceded by a comma without returning the comma. Read more about positive lookaround here.
  • \[[^[]+\] | [^,]+

    • \[[^[]+\] Match a string that starts with [ and ends with ] and can contain a string of one or more characters that aren't [
      • This because in [1,2],[a,b] it can match the whole string at once since it starts with [ and ends with ]. This way our condition stops that by removing matches that also contain [ indicating that it belongs the second array.
    • | Alternatively
    • [^,]+ Match a string of any length that doesn't contain a comma, for the same reason as the brackets above since with ,asd,qwe, technically all of asd,qwe is enclosed with commas.
  • ((?=,)|$)

    • (?=,) Match any string that's followed by a comma
    • | Alternatively
    • $ Match a string that ends with the end of the main string. Read here for a better explanation.
Brother58697
  • 2,290
  • 2
  • 4
  • 12