2

I am currently building a small text editor for a custom file format. I have a GUI, but I also implemented a small output console. What I want to achieve is to add a very basic input field to execute some commands and pass parameters. A command would look like :

compile test.json output.bin -location "Paris, France" -author "Charles \"Demurgos\""

My problem is to get an array containing the space-separated arguments, but preserving the double quoted parts which might be a string generated by JSON.stringify containing escaped double-quotes inside.

To be clear, the expected array for the previous command is :

[
    'compile',
    'test.json',
    'output.bin',
    '-location',
    '"Paris, France"',
    '-author',
    '"Charles \\"Demurgos\\""'
]

Then I can iterate over this array and apply a JSON.parse if indexOf('"') == 0 to get the final result :

[
    'compile',
    'test.json',
    'output.bin',
    '-location',
    'Paris, France',
    '-author',
    'Charles "Demurgos"'
]

Thanks to this question : Split a string by commas but ignore commas within double-quotes using Javascript . I was able to get what I need if the arguments do NOT contain any double-quotes. Here is the regex i got :

/(".*?"|[^"\s]+)(?=\s*|\s*$)/g

But it exits the current parameter when it encounters a double-quote, even if it is escaped. How can I adapt this RegEx to take care about the escaped or not double quotes ? And what about edge cases if I prompt action "windowsDirectory\\" otherArg, here the backslash is already escaped so even if it's followed by a double quote, it should exit the argument. This a problem I was trying to avoid as long as possible during previous projects, but I feel it's time for me to learn how to properly take under-account escape characters.

Here is a JS-Fiddle : http://jsfiddle.net/GwY8Y/1/ You can see that the beginning is well-parsed but the last arguments is split and bugs.

Thank you for any help.

Community
  • 1
  • 1
Demurgos
  • 1,568
  • 18
  • 40

1 Answers1

3

This regex will give you the strings you need (see demo):

"(?:\\"|\\\\|[^"])*"|\S+

Use it like this:

your_array = subject.match(/"(?:\\"|\\\\|[^"])*"|\S+/g);

Explain Regex

"                        # '"'
(?:                      # group, but do not capture (0 or more times
                         # (matching the most amount possible)):
  \\                     #   '\'
  "                      #   '"'
 |                       #  OR
  \\\\                   #   two backslashes
 |                       #  OR
  [^"]                   #   any character except: '"'
)*                       # end of grouping
"                        # '"'
|                        # OR
\S+                      # non-whitespace (all but \n, \r, \t, \f,
                         # and " ") (1 or more times (matching the
                         # most amount possible))
zx81
  • 41,100
  • 9
  • 89
  • 105
  • @Demurgos Hey Demurgos, did this help with your question? Or are you still wrestling with it? Thanks for letting me know. :) – zx81 Jun 06 '14 at 10:08
  • Sorry, I had a weird bug this morning and could only access stack-overflow trough a proxy. First of all : thank you, it works fine... but fails on the last case I discuss in the last paragraph : if the string ends with an odd number of anti-slashes, then they are escaping themselves and the double-quote should be an END double quote, but here it ignores that because of the escaped anti-slash. I'm trying to fix it with http://tinyurl.com/p2k32ty but it fails without "lookbehind" in JS. Here is the buggy string : http://regex101.com/r/dG5yE6 – Demurgos Jun 06 '14 at 12:49
  • @Demurgos Ha, a new case. :) I edited the answer (see new [demo](http://regex101.com/r/cA7kY1)) Pls let me know if that works for you. :) – zx81 Jun 06 '14 at 18:26
  • I tried a few other cases, and it worked ! Thank you =) (I still have to train on regexp, but it was a general issue I had so I feel I will be able to extend it for my future needs) – Demurgos Jun 06 '14 at 18:51
  • @Demurgos Btw for a general solution to exclude certain situations from a regex, I highly recommend you have a look at [this question](http://stackoverflow.com/questions/23589174/match-or-replace-a-pattern-except-in-situations-s1-s2-s3-etc/23589204#23589204) or save it for later, I had a lot of fun writing the answer. :) Here we didn't use this technique as there was no need. – zx81 Jun 06 '14 at 18:55
  • @Demurgos Terrific. Hanging up now, catch you some other time. :) – zx81 Jun 06 '14 at 18:58