var str = 'single words "fixed string of words"';
var astr = str.split(" "); // need fix
I would like the array to be like this:
var astr = ["single", "words", "fixed string of words"];
var str = 'single words "fixed string of words"';
var astr = str.split(" "); // need fix
I would like the array to be like this:
var astr = ["single", "words", "fixed string of words"];
The accepted answer is not entirely correct. It separates on non-space characters like . and - and leaves the quotes in the results. The better way to do this so that it excludes the quotes is with capturing groups, like such:
//The parenthesis in the regex creates a captured group within the quotes
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myString = 'single words "fixed string of words"';
var myArray = [];
do {
//Each call to exec returns the next regex match as an array
var match = myRegexp.exec(myString);
if (match != null)
{
//Index 1 in the array is the captured group if it exists
//Index 0 is the matched text, which we use if no captured group exists
myArray.push(match[1] ? match[1] : match[0]);
}
} while (match != null);
myArray will now contain exactly what the OP asked for:
single,words,fixed string of words
str.match(/\w+|"[^"]+"/g)
//single, words, "fixed string of words"
This uses a mix of split and regex matching.
var str = 'single words "fixed string of words"';
var matches = /".+?"/.exec(str);
str = str.replace(/".+?"/, "").replace(/^\s+|\s+$/g, "");
var astr = str.split(" ");
if (matches) {
for (var i = 0; i < matches.length; i++) {
astr.push(matches[i].replace(/"/g, ""));
}
}
This returns the expected result, although a single regexp should be able to do it all.
// ["single", "words", "fixed string of words"]
Update And this is the improved version of the the method proposed by S.Mark
var str = 'single words "fixed string of words"';
var aStr = str.match(/\w+|"[^"]+"/g), i = aStr.length;
while(i--){
aStr[i] = aStr[i].replace(/"/g,"");
}
// ["single", "words", "fixed string of words"]
ES6 solution supporting:
Code:
str.match(/\\?.|^$/g).reduce((p, c) => {
if(c === '"'){
p.quote ^= 1;
}else if(!p.quote && c === ' '){
p.a.push('');
}else{
p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
}
return p;
}, {a: ['']}).a
Output:
[ 'single', 'words', 'fixed string of words' ]
This will split it into an array and strip off the surrounding quotes from any remaining string.
const parseWords = (words = '') =>
(words.match(/[^\s"]+|"([^"]*)"/gi) || []).map((word) =>
word.replace(/^"(.+(?="$))"$/, '$1'))
This soulution would work for both double (") and single (') quotes:
Code:
str.match(/[^\s"']+|"([^"]*)"/gmi)
// ["single", "words", "fixed string of words"]
Here it shows how this regular expression would work: https://regex101.com/r/qa3KxQ/2
Until I found @dallin 's answer (this thread: https://stackoverflow.com/a/18647776/1904943) I was having difficulty processing strings with a mix of unquoted and quoted terms / phrases, via JavaScript.
In researching that issue, I ran a number of tests.
As I found it difficult to find this information, I have collated the relevant information (below), which may be useful to others seeking answers on the processing in JavaScript of strings containing quoted words.
let q = 'apple banana "nova scotia" "british columbia"';
Extract [only] quoted words and phrases:
// https://stackoverflow.com/questions/12367126/how-can-i-get-a-substring-located-between-2-quotes
const r = q.match(/"([^']+)"/g);
console.log('r:', r)
// r: Array [ "\"nova scotia\" \"british columbia\"" ]
console.log('r:', r.toString())
// r: "nova scotia" "british columbia"
// ----------------------------------------
// [alternate regex] https://www.regextester.com/97161
const s = q.match(/"(.*?)"/g);
console.log('s:', s)
// s: Array [ "\"nova scotia\"", "\"british columbia\"" ]
console.log('s:', s.toString())
// s: "nova scotia","british columbia"
Extract [all] unquoted, quoted words and phrases:
// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
const t = q.match(/\w+|"[^"]+"/g);
console.log('t:', t)
// t: Array(4) [ "apple", "banana", "\"nova scotia\"", "\"british columbia\"" ]
console.log('t:', t.toString())
// t: apple,banana,"nova scotia","british columbia"
// ----------------------------------------------------------------------------
// https://stackoverflow.com/questions/2817646/javascript-split-string-on-space-or-on-quotes-to-array
// [@dallon 's answer (this thread)] https://stackoverflow.com/a/18647776/1904943
var myRegexp = /[^\s"]+|"([^"]*)"/gi;
var myArray = [];
do {
/* Each call to exec returns the next regex match as an array. */
var match = myRegexp.exec(q); // << "q" = my query (string)
if (match != null)
{
/* Index 1 in the array is the captured group if it exists.
* Index 0 is the matched text, which we use if no captured group exists. */
myArray.push(match[1] ? match[1] : match[0]);
}
} while (match != null);
console.log('myArray:', myArray, '| type:', typeof(myArray))
// myArray: Array(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
console.log(myArray.toString())
// apple,banana,nova scotia,british columbia
Work with a set (rather than an array):
// https://stackoverflow.com/questions/28965112/javascript-array-to-set
var mySet = new Set(myArray);
console.log('mySet:', mySet, '| type:', typeof(mySet))
// mySet: Set(4) [ "apple", "banana", "nova scotia", "british columbia" ] | type: object
Iterating over set elements:
mySet.forEach(x => console.log(x));
/* apple
* banana
* nova scotia
* british columbia
*/
// https://stackoverflow.com/questions/16401216/iterate-over-set-elements
myArrayFromSet = Array.from(mySet);
for (let i=0; i < myArrayFromSet.length; i++) {
console.log(i + ':', myArrayFromSet[i])
}
/*
0: apple
1: banana
2: nova scotia
3: british columbia
*/
Asides
The JavaScript responses above are from the FireFox Developer Tools (F12, from web page). I created a blank HTML file that calls a .js
file that I edit with Vim, as my IDE. Simple JavaScript IDE
Based on my tests, the cloned set appears to be a deep copy. Shallow-clone an ES6 Map or Set
I noticed the disappearing characters, too. I think you can include them - for example, to have it include "+" with the word, use something like "[\w\+]" instead of just "\w".