3

I have a string that looks like this:

var message = '"this is a question" "answer one" "answer two" "answer three"';

I want each sentence within the quotes of the string to be in an array like this:

array = ["this is a question", "answer one", "answer two", "answer three"];

How do I achieve this in JavaScript? Thanks.

btror
  • 113
  • 9
  • Does this answer your question? [javascript split string by space, but ignore space in quotes (notice not to split by the colon too)](https://stackoverflow.com/questions/16261635/javascript-split-string-by-space-but-ignore-space-in-quotes-notice-not-to-spli) – Ivar Dec 31 '20 at 01:19

3 Answers3

3

Did you try str.split()?

const array = message.split(/\s(?=")/);
//This regex searches for a space, but makes sure it has a " after it.

docs: https://www.w3schools.com/jsref/jsref_split.asp

Just in case you want to break down the regex:

/ start of regex
  \s escaped character: whitespace
  (?= start of positive lookahead
    " literal character: quotation mark
  ) end of positive lookahead
/ end of regex

EDIT: Oskar Grosser said in the comments that sometimes this won't work. Here's a fix to this:

/(?<=[^\\]?")\s+(?=")/

Breakdown:

/ start of regex
  (?<= start of positive lookbehind
    [^ start of negated charset (in case there was any \"s)
      \\ escaped literal character: backslash (\)
    ] end of negated charset
    ? quantifier: 0 or 1 (in case it was at the beginning)
    " literal character: quotation mark (")
  ) end of positive lookbehind
  \s escaped character: whitespace
  + quantifier: 1 or more (for in case there was 2 whitespace instead of 1)
  (?= start of positive lookahead
    " literal character: quotation mark (")
  ) end of positive lookahead
/ end of regex

NOTE that lookbehinds work on some browsers, but not all browsers.

Azure
  • 127
  • 11
  • 1
    This won't work as expected for quotes containing trailing white-spaces, when another quote is followed (e.g. `'"Trailing " "Regular"'`). This will also not work as expected for quotes seperated by more than one white-space (e.g. `'"Two" "white-spaces inbetween"'`). – Oskar Grosser Dec 31 '20 at 02:41
  • For a better breakdown of the regex, copy it into here https://regexr.com/ – Azure Dec 31 '20 at 04:14
2

Try this function I just cooked up:

let string = '"this is a question" "answer one" "answer two" "answer three"'
        let string2 = '"this is a question " " answer one" " answer two" "answer three"'
        let string3 = '"this is a question"random omitted "answer one" text between quotes "answer two" zzz "answer three"'
        
        function splitString(string) {
               let wordArray = []
               let incompleteWord = ""
               let quotePos = 0;
               for(let i = 0; i < string.length; i++) {
                 if(string.charAt(i) === '"'){
                   if(quotePos === 0)
                     quotePos = 1
                   else {
                     wordArray.push(incompleteWord.trim())
                     incompleteWord = ""
                     quotePos = 0
                     continue
                   }
                 } else {
                   if(quotePos === 1)
                    incompleteWord += string.charAt(i)
                 }
               }
               return wordArray
             }
    console.log(splitString(string))
    console.log(splitString(string2))
    console.log(splitString(string3))
Eric McWinNEr
  • 534
  • 6
  • 19
  • 1
    I did something very similar in Java, but I struggled to get my version to work in JS. This one works, thanks! – btror Dec 31 '20 at 01:35
  • 1
    It does have one problem: Characters between quotes are also added to `incompleteWord`. Checking if characters are part of the quote before adding them to `incompleteWord` should solve it. Great solution though! – Oskar Grosser Dec 31 '20 at 03:20
  • Thanks so much for pointing out @OskarGrosser I can't believe I didn't notice it. I also noticed my code wasn't trimming the characters between the quotes so I updated my code. Thanks again :) – Eric McWinNEr Dec 31 '20 at 15:17
1
  1. Get the full quotes. This is done using String.split().
  2. (Optional) remove "" (double quotes) of the quotes. This can be done with String.replace() inside Array.map().

1. Getting the quotes

Using the following Regular Expression will split the String at the white-spaces between "" (double quotes), however will not consider \" (escaped double quote): /(?<=[^\\]")\s+(?=")/

Here an explanation of it:

(?<=[^\\]") # Is preceded by not a backslash and a double quote
            # Better said: is preceded by an un-escaped double quote
\s+         # Consists of at least one, and only white-spaces
(?=")       # Is followed by double quote

If there is none, or any other character than white-spaces, it will not be split. This allows for double quotes with beginning or trailing white-spaces (e.g. " This "), but not for double quotes only consisting of white-spaces.

var message = '"Containing \" escaped quote" "Trailing white-spaces here  " " Beginning and trailing white-spaces here " "  Beginning white-spaces here"';

console.log(message.split(/(?<=[^\\]")\s+(?=")/));

Why check on both sides? Wouldn't it suffice to only look ahead?

This becomes a problem for when the input string has a quoted part with trailing white-spaces before another quoted part, e.g. "Trailing " "Regular".
Here, the first split will be right after "Trailing, splitting away (white-space) before its " (double quote). The next split will be at the third " (double quote), splitting away its (white-space). However, the end-double quote " of "Trailing " will now be part of the returned array, since on both its sides are splits.

Also, without checking if the first quote is escaped, escaped quotes are impossible.

Looking both ahead and before solves this problem. However, it still doesn't solve the problem of " " (double quotes containing white-spaces) being excluded. To my knowledge, splitting the input string to include such "empty" double quotes is impossible using regular expressions.

Here is a demonstration of only looking ahead:

var message = '"Containing \" escaped quote" "Trailing white-spaces here  " " Beginning and trailing white-spaces here " "  Beginning white-spaces here"';

console.log(message.split(/\s+(?=")/));

2. Removing nesting quotes

After retrieving the quoted strings, we can remove starting and ending "" (double quotes) using String.replace() inside Array.map().

Removing the escaping backslash of \" (escaped double quote) is not needed as it is parsed as " (double quote). However, our splitting regular expression excludes specifically the escaped double quotes as explained before.

var message = '"Containing \" escaped quote" "Trailing white-spaces here  " " Beginning and trailing white-spaces here " "  Beginning white-spaces here"';

for (var i of message.split(/(?<=[^\\]")\s+(?=")/).map(i => i.replace(/^"|"$/g, '')))
  console.log(i);

Note that String.replace() using a regular expression with the g (global) flag is the same as String.replaceAll(). However, we require a regular expression to only look for the beginning and trailing quotes.

Oskar Grosser
  • 2,804
  • 1
  • 7
  • 18
  • include [^\\\] in lookbehind. otherwise strings like `"I escaped\" "` would not work. – Azure Dec 31 '20 at 04:13
  • @EarlyBird And then replace the `\\` (backslash) at the end for the resulting unquoted string? Will do it right away. – Oskar Grosser Dec 31 '20 at 13:59
  • Just remembered that replacing the backslash of the escaped quote is not needed, as it is an escaped quote and thus parsed as a regular quote. – Oskar Grosser Dec 31 '20 at 14:23