37
var str = 'single words "fixed string of words"';
var astr = str.split(" "); // need fix

I would like the array to be like this:

var astr = ["single", "words", "fixed string of words"];
Remi
  • 474
  • 1
  • 4
  • 10

9 Answers9

42

The accepted answer is not entirely correct. It separates on non-space characters like . and - and leaves the quotes in the results. The better way to do this so that it excludes the quotes is with capturing groups, like such:

//The parenthesis in the regex creates a captured group within the quotes
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myString = 'single words "fixed string of words"';
var myArray = [];

do {
    //Each call to exec returns the next regex match as an array
    var match = myRegexp.exec(myString);
    if (match != null)
    {
        //Index 1 in the array is the captured group if it exists
        //Index 0 is the matched text, which we use if no captured group exists
        myArray.push(match[1] ? match[1] : match[0]);
    }
} while (match != null);

myArray will now contain exactly what the OP asked for:

single,words,fixed string of words
dallin
  • 8,775
  • 2
  • 36
  • 41
  • 8
    Works well, thank you. Just to say the 'i' switch looks to be redundant. – Martin Connell Jul 12 '16 at 15:20
  • 1
    var myRegexp = [^\s"]+|"(?:\\"|[^"])*"/g ...allows for \" (escaped quotes within quotes) – Nik Feb 19 '21 at 16:23
  • 1
    I had posted a question asking the exact issue, later deleting it (no replies/answers) after a more dedicated search found this excellent answer. As mentioned, the solution above does exactly as the OP asked (`'apple banana "nova scotia" "british columbia"'` >> `"apple", "banana", "nova scotia", "british columbia"` -- and I learned something new viz-a-viz JavaScript! :-) – Victoria Stuart Feb 25 '21 at 19:17
33
str.match(/\w+|"[^"]+"/g)

//single, words, "fixed string of words"
YOU
  • 120,166
  • 34
  • 186
  • 219
  • 8
    this seems to split on '.' and '-' as well as spaces. This should probably be `str.match(/\S+|"[^"]+"/g)` – Awalias Apr 09 '13 at 13:22
  • There's another problem with this, if it has to handle escaped quotes. For example: `'single words "fixed string of \"quoted\" words"'` Even with Awalias' correction, this gives: `["single", "words", ""fixed", "string", ""of", "words""]` You'd need to handle escaped quotes, but not trip up and grab and escaped backslash. I think it would eventually get more complicated than you'd really want to handle with a regexp. – jep Jun 20 '13 at 14:55
  • 2
    @Awalias I have a better answer below. Your regex example actually should be /[^\s"]+|"([^"]*)"/g. Yours will still split on spaces in quoted areas. I added an answer that fixes this and removes the quotation marks from the results like the OP asked for. – dallin Sep 06 '13 at 00:02
  • 1
    If you want to allow escaping the quotes, see [this other SO question](https://stackoverflow.com/questions/4031900). – bitinerant Sep 14 '19 at 13:57
12

This uses a mix of split and regex matching.

var str = 'single words "fixed string of words"';
var matches = /".+?"/.exec(str);
str = str.replace(/".+?"/, "").replace(/^\s+|\s+$/g, "");
var astr = str.split(" ");
if (matches) {
    for (var i = 0; i < matches.length; i++) {
        astr.push(matches[i].replace(/"/g, ""));
    }
}

This returns the expected result, although a single regexp should be able to do it all.

// ["single", "words", "fixed string of words"]

Update And this is the improved version of the the method proposed by S.Mark

var str = 'single words "fixed string of words"';
var aStr = str.match(/\w+|"[^"]+"/g), i = aStr.length;
while(i--){
    aStr[i] = aStr[i].replace(/"/g,"");
}
// ["single", "words", "fixed string of words"]
Sean Kinsey
  • 37,689
  • 7
  • 52
  • 71
  • There's a problem with the improved version, where if you use a non-word-character like "#" it will disappear. – Tuhis Jun 26 '12 at 22:03
  • This is a good answer, but if you want to do it all via regex and have the quotes removed, I added a new answer that does this and doesn't require looping through every result to strip out the quotes afterwards. – dallin Sep 06 '13 at 00:25
5

Here might be a complete solution: https://github.com/elgs/splitargs

3

ES6 solution supporting:

  • Split by space except for inside quotes
  • Removing quotes but not for backslash escaped quotes
  • Escaped quote become quote
  • Can put quotes anywhere

Code:

str.match(/\\?.|^$/g).reduce((p, c) => {
        if(c === '"'){
            p.quote ^= 1;
        }else if(!p.quote && c === ' '){
            p.a.push('');
        }else{
            p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
        }
        return  p;
    }, {a: ['']}).a

Output:

[ 'single', 'words', 'fixed string of words' ]
Tsuneo Yoshioka
  • 7,504
  • 4
  • 36
  • 32
2

This will split it into an array and strip off the surrounding quotes from any remaining string.

const parseWords = (words = '') =>
    (words.match(/[^\s"]+|"([^"]*)"/gi) || []).map((word) => 
        word.replace(/^"(.+(?="$))"$/, '$1'))
tim.breeding
  • 111
  • 1
  • 5
0

This soulution would work for both double (") and single (') quotes:

Code:

str.match(/[^\s"']+|"([^"]*)"/gmi)

// ["single", "words", "fixed string of words"]

Here it shows how this regular expression would work: https://regex101.com/r/qa3KxQ/2

julianYaman
  • 310
  • 4
  • 13
0

Until I found @dallin 's answer (this thread: https://stackoverflow.com/a/18647776/1904943) I was having difficulty processing strings with a mix of unquoted and quoted terms / phrases, via JavaScript.

In researching that issue, I ran a number of tests.

As I found it difficult to find this information, I have collated the relevant information (below), which may be useful to others seeking answers on the processing in JavaScript of strings containing quoted words.


let q = 'apple banana "nova scotia" "british columbia"';

Extract [only] quoted words and phrases:

// https://stackoverflow.com/questions/12367126/how-can-i-get-a-substring-located-between-2-quotes
const r = q.match(/"([^']+)"/g);
console.log('r:', r)
// r: Array [ "\"nova scotia\" \"british columbia\"" ]
console.log('r:', r.toString())
// r: "nova scotia" "british columbia"

// ----------------------------------------

// [alternate regex] https://www.regextester.com/97161
const s = q.match(/"(.*?)"/g);
console.log('s:', s)
// s: Array [ "\"nova scotia\"", "\"british columbia\"" ]
console.log('s:', s.toString())
// s: "nova scotia","british columbia"

Extract [all] unquoted, quoted words and phrases:

// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
const t = q.match(/\w+|"[^"]+"/g);
console.log('t:', t)
// t: Array(4) [ "apple", "banana", "\"nova scotia\"", "\"british columbia\"" ]
console.log('t:', t.toString())
// t: apple,banana,"nova scotia","british columbia"

// ----------------------------------------------------------------------------

// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
// [@dallon 's answer (this thread)] https://stackoverflow.com/a/18647776/1904943

var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myArray = [];

do {
    /* Each call to exec returns the next regex match as an array. */
    var match = myRegexp.exec(q);    // << "q" = my query (string)
    if (match != null)
    {
        /* Index 1 in the array is the captured group if it exists.
         * Index 0 is the matched text, which we use if no captured group exists. */
        myArray.push(match[1] ? match[1] : match[0]);
    }
} while (match != null);

console.log('myArray:', myArray, '| type:', typeof(myArray))
// myArray: Array(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
console.log(myArray.toString())
// apple,banana,nova scotia,british columbia

Work with a set (rather than an array):

// https://stackoverflow.com/questions/28965112/javascript-array-to-set
var mySet = new Set(myArray);
console.log('mySet:', mySet, '| type:', typeof(mySet))
// mySet: Set(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object

Iterating over set elements:

mySet.forEach(x => console.log(x));
/* apple
 * banana
 * nova scotia
 * british columbia
 */

// https://stackoverflow.com/questions/16401216/iterate-over-set-elements
myArrayFromSet = Array.from(mySet);

for (let i=0; i < myArrayFromSet.length; i++) {
    console.log(i + ':', myArrayFromSet[i])
}
/*
 0: apple
 1: banana
 2: nova scotia
 3: british columbia 
 */

Asides

  • The JavaScript responses above are from the FireFox Developer Tools (F12, from web page). I created a blank HTML file that calls a .js file that I edit with Vim, as my IDE. Simple JavaScript IDE

  • Based on my tests, the cloned set appears to be a deep copy. Shallow-clone an ES6 Map or Set

Victoria Stuart
  • 4,610
  • 2
  • 44
  • 37
-1

I noticed the disappearing characters, too. I think you can include them - for example, to have it include "+" with the word, use something like "[\w\+]" instead of just "\w".

user655489
  • 1,316
  • 3
  • 14
  • 21