0

I've used .split() dozens of times to convert strings into arrays. This is the first time I have received an unexpected result. Here, split returns an array with empty strings. This is odd, as there are no spaces in the string being split.

Is there a regex way to avoid this new and unexpected result, or need I simply remove them with a new line of code?

let infix = "(A+B)*C";

let infixArr = infix.split(/(\W)/g);
console.log(infixArr);

Yields -- > [ '', '(', 'A', '+', 'B', ')', '', '*', 'C' ]
Expected --> [ '(', 'A', '+', 'B', ')', '*', 'C' ]
Ryan
  • 1,312
  • 3
  • 20
  • 40
  • 2
    The empty strings are between 2 separators. – mdatsev Jan 02 '18 at 00:41
  • 1
    Possible duplicate of [regex - documentation on empty strings before and after characters](https://stackoverflow.com/questions/33131980/regex-documentation-on-empty-strings-before-and-after-characters) – Jongware Jan 02 '18 at 00:43

3 Answers3

3

The problem is that there are multiple separators next to each other so the string between them is empty. You could filter the empty strings using filter:

"(A+B)*C".split(/(\W)/g).filter(c => c !== ''); 
//=> ["(", "A", "+", "B", ")", "*", "C"]
mdatsev
  • 3,054
  • 13
  • 28
  • Works beautifully without the + in \W+ I was unaware of this behaviour. Thanks for the clarification and response. – Ryan Jan 02 '18 at 00:48
  • Yeah I accidentally pasted it with a + because I was testing with that :D – mdatsev Jan 02 '18 at 00:50
2

With split method, you will have empty strings when the string starts with or ends with the separator or there's nothing between two separators as commented @mdatsev. You can use String.match method with regex \W|\w+ to extract the patterns you need:

let infix = "(A+B)*C";

console.log(
  infix.match(/\W|\w+/g)
)
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

The empty strings are appearing because \W is splitting on any non-word character, which includes the parentheses: the first is separating an empty string and A. The second is next to another non-word character, the *, hence the second empty string, for the empty space between them.

The non-word characters are then being spliced back into the result array, per the spec:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

If you just want to split into its constituent characters, why not just call infix.split(''), out of interest?

Chris Applegate
  • 752
  • 3
  • 11
  • 1
    Or `Array.from(infix)` or `[...infix]`. You actually get better support of astrals this way. Split will separate low and high surrogates. – MinusFour Jan 02 '18 at 00:53
  • Your result works fine for this example. More complex examples are ahead for which the regex is best suited. – Ryan Jan 02 '18 at 00:56