2

I want to use the JS String split function to split this string based only on the commas ,, and not the commas preceded by backslashes /,. How can I do this?

'this,is\,a,\,string'.split(/,/)

This code splits it on all strings, I'm not sure how to get it to split just on the commas not preceded by backslashes.

user779159
  • 9,034
  • 14
  • 59
  • 89
  • If there is always a *word character* preceding the comma you want to split at, you can [use a *word boundary*](https://regex101.com/r/KEPDtJ/1). Not sure if this is sufficient for all your input. – bobble bubble May 07 '17 at 12:16
  • Can you please give an example? – user779159 May 07 '17 at 12:18
  • Why would you want to do that? Please give more context. It seems like someone made a mistake escaping commas with backslashes, but not escaping backslashes. If that's the case, two different lists of strings can be encoded as the same string, and it's impossible to decode it without ambiguity. –  May 07 '17 at 12:20
  • 1
    You need to add extra backslashes in the string: `'this,is\\,a,\\,string'.split(/\b,\b/);` – Taufik Nurrohman May 07 '17 at 12:22
  • Possible duplicate of [Javascript: negative lookbehind equivalent?](http://stackoverflow.com/questions/641407/javascript-negative-lookbehind-equivalent) – Imanuel May 07 '17 at 12:48

5 Answers5

8

Since lookbehinds are not supported in JavaScript, it's hard to define "not preceded by something" pattern for split. However, you may define a "word" as a sequence of non-commas or escaped commas:

(?:\\,|[^,])+

(demo: https://regex101.com/r/d5W21v/1)

and extract all "word" matches:

var matches = "this,is\\,a,\\,string".match(/(?:\\,|[^,])+/g);
console.log(matches);
Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • 1
    It's probably good to allow escaping of the slash, so '\' becomes '\\' and ',' becomes '\,'. The regex thus becomes (?:\\,|\\\\|[^,])+ – Jeow Li Huan Jul 21 '23 at 03:48
1

Replace the non-splitting symbol with a temporary symbol, split, and then restore the non-splitting symbol

 'this,is\,a,\,string'.replace('\,','##NONBREAKING##').split(',')

Then loop over the resulting array, replacing '##NONBREAKING##' with '\,'.

Obviously the temporary symbol '##NONBREAKING##' must be something that can never occur in the text you are breaking. Perhaps include some Unicode characters that are hard to type in? Or include characters from multiple different languages (e.g. chinese, russian, indian, native american) that are unlikely to appear together in genuine text.

ProfDFrancis
  • 8,816
  • 1
  • 17
  • 26
  • Agreed. But it should be possible to come up with a very implausible combination, e.g. ┋₪↝⅊﷼ Ðᵯ✈ ℈◆ᾋ. – ProfDFrancis May 07 '17 at 12:35
1

I think what you're looking for is called "Negative Lookbehind" - a regex element that looks back in the string and makes sure the pattern is not preceded by another pattern.

However, Javascript doesn't natively support Lookbehind. It does, however, (Negative and positive) Support Lookahead.

So you could: 1. reverse the string 2. split by comma (unless it's followed by slash) 3. reverse the words back 4. reverse order of words

var temp = "this,is\\,a,\\,string"
var reversed = temp.split('').reverse().join('')
var words = t2.split(/,(?!\\)/).map(x => x.split('').reverse().join(''))
var finalResult = words.reverse()

It's kindof cumbersome though...

Yossi Vainshtein
  • 3,845
  • 4
  • 23
  • 39
0

You can create alternatively a custom method which retrieves an array. If a comma found and its not preceded by a backslash, substring. Obviously you need a counter to update the position next to the comma.

Hope this can be helpful

  • Can you please give an example? – user779159 May 07 '17 at 12:17
  • You create an empty array.Then a for to iterate through the string. If comma found and previous character is not backslash then substring from zero to position of previous character (if you're not at the beginning). Update counter to comma position to know from where substring next time. –  May 07 '17 at 12:22
0

This method is only currently supported in Chrome 62 (desktop and Android), Opera 49, and Node.js 8.10

A limited set of JavaScript engines now support lookbehinds, so the following works in supported environments:

console.log('this,is\\,a,\\,string'.split(/(?<!\\),/))

Since this doesn't currently work in Firefox, Safari, or iOS Chrome (among others), it's not particularly useful for client-side development, but it is useful for Node apps.

Mozilla has an up-to-date browser compatibility section for regex lookbehinds.

dx_over_dt
  • 13,240
  • 17
  • 54
  • 102
  • On node 11.3 I tried `'this,is\,a,\,string'.split(/(?<!\\),/)` and it gave the same result as the code in my question `'this,is\,a,\,string'.split(/,/)`. Is the code you posted working correctly for you in node? – user779159 Feb 03 '19 at 14:48
  • The fact that it didn't throw an error means that lookbehinds are supported in your version of Node. It looks like your issue is that you aren't escaping your ``\``'s. Your backslashes are escaping the commas, which are just commas, so it's evaluating the regex over the string `this,is,a,string`. – dx_over_dt Feb 03 '19 at 22:10