7

I am trying to split a string on white spaces only (\s), but that are not between a "quoted" section.

I am matching all text in between these quoted sections in the following manner:

(['"`]).*?\1

Regex101

However, when I try to add this as a negative lookahead, to only split on white spaces outside of those quotes, I can't get it to work:

\s(?!(['"`]).*?\1)

Regex101

How can I only split on the white spaces that are not in "quotes"?

KevBot
  • 17,900
  • 5
  • 50
  • 68

3 Answers3

11
\s(?=(?:[^'"`]*(['"`])[^'"`]*\1)*[^'"`]*$)

You can use this regex with lookahead to split upon.See demo.

https://regex101.com/r/5I209k/4

or if mixed tick types.

https://regex101.com/r/5I209k/7

vks
  • 67,027
  • 10
  • 91
  • 124
  • 1
    This doesn't work very well with miss-matched or nested quotes. But for a well balanced and non-nested quotes it works like a charm. E.g. `Split on spaces, except "for this, and 'this section'", and \`ignore this too\`.` and `Split on spaces, except "for this', and 'this section', and \`ignore this too\`."` – Marcus Nov 08 '16 at 08:39
  • @Marcus i guess OP can tell if such a scenario would occur or not.Also none of the answers would work in that case. – vks Nov 08 '16 at 08:46
  • of course, it was just to inform future readers of the edge cases. As mentioned, it works like charm for the example string. – Marcus Nov 08 '16 at 08:49
  • Just so you know intent, the quotes must be in pairs. I have been trying to convert a complex `replace` chain with a split regular expression to simplify the process. I am working on a code parser, and need to know when a developer is intending for a string to be passed to a function, or if it should be an object. The spaces represent the separate arguments being passed to a function. I hope that makes sense. – KevBot Nov 09 '16 at 00:00
  • what if i want to be able to escape quotes like this: `\"`? this wouldn't work – RedGuy11 Sep 01 '21 at 15:25
  • @vks can you show a sample regex? i can't seem to figure it out – RedGuy11 Sep 01 '21 at 15:37
  • @vks https://regex101.com/r/KAx8LX/2 how can i make it split in the areas where it says `should split here`? – RedGuy11 Sep 01 '21 at 15:44
  • @RedGuy11 https://regex101.com/r/35YZkm/1 – vks Sep 01 '21 at 18:13
2

The problem is that you need to exclude entries within the group. Instead of using a negative lookahead you could do it like this:

(\S*(?:(['"`]).*?\2)\S*)\s?|\s

Basically what it does is to:

  • captures any non-whitespace characters
    • that may contain a quoted string
    • and is optionally directly followed by any non-whitespace (e.g a comma after the quote).
  • then matches an optional trailing whitespace

OR

  • matches a single whitespace

Capture group1 will then contain an as long as possible sequences of all non-whitespace characters (unless they are within quotes). This can thus be used with the replacement group \1\n to replace your desired whitespaces with a newline.

Regex101: https://regex101.com/r/A4HswJ/1

JSFiddle: http://jsfiddle.net/u1kjudmg/1/

Marcus
  • 12,296
  • 5
  • 48
  • 66
0

I'd use a simpler approach, no need of advanced features:

'([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S*

meaning:

  • a single-quoted section '([^']|\\.)*'
  • or | a double-quoted section "([^"]|\\.)*"
  • or | a back-quoted section (can't place it inline in SO markdown)
  • or | an un-quoted section \S*

This will separate also quoted parts. If this is not wanted you can instead use

('([^']|\\.)*'|"([^"]|\\.)*"|`([^`]||\.)*`|\S)+

i.e. find sequences of tokens where each token is either a non-whitespace or a quoted section.

6502
  • 112,025
  • 15
  • 165
  • 265
  • Per the example, wouldn't this break `"for this",` into two parts (`"for this"` and `,`)? – Marcus Nov 08 '16 at 07:56
  • @MarcusMarco not the seccond one @6502 how can I make it work with escaped quotes? aka `\"`? – RedGuy11 Sep 01 '21 at 15:28
  • @RedGuy11: The regexp `"([^"]|\\.)*"` already handles escaped double-quotes (the internal part means "either not a double quote or a backslash followed by anything") – 6502 Sep 01 '21 at 20:00