0

I have the following program:

var input = "This, is just a test! Every special ? character( including spaces.) should not be caught in the < > array I am about to form!"
var reg = /\W*[\s]/;

var words = input.split(reg);

console.log(words);

My expected output is:

[ 'This', 'is', 'just', ............ 'the', 'array', 'I', 'am', 'about', 'to', 'form' ]

However the output that I get is:

[ 'This', 'is', 'just', ............ 'the', 'array', 'I', 'am', 'about', 'to', 'form!' ]

As you can see it does not split the last word 'form!' properly.It includes '!' with it. Its just the last word it does not split properly. Every other word gets a proper split.

How should I solve this using regular expressions?

Haris Ghauri
  • 547
  • 2
  • 7
  • 27
  • 2
    You can use `var words = input.match(/[a-zA-Z0-9]+/g)` [**Demo**](https://regex101.com/r/mN6eZ7/2) – Tushar Jan 20 '16 at 14:47
  • 5
    There is no space at the end. Thus, use `\W*(?:\s|$)` (if you want to keep your logic). Why not use `/\W+/`? See [this demo](https://regex101.com/r/mN6eZ7/1) – Wiktor Stribiżew Jan 20 '16 at 14:48
  • Your regular expression matches zero or more non-word characters (`\W*`) followed by a single space character (`[\s]`). At the end of your string, there's only a non-word character (!), no space following it, so your regular expression doesn't match. – Anthony Grist Jan 20 '16 at 14:49
  • 1
    Shall we close it as a dupe of http://stackoverflow.com/questions/19509635/ruby-string-split-into-words-ignoring-all-special-characters-simpler-query? It is for Ruby, but the regex is the same. Or [this one for Java](http://stackoverflow.com/questions/26351680/how-to-split-on-any-non-word-character-with-0-or-more-trailing-and-leading-white) – Wiktor Stribiżew Jan 20 '16 at 14:51
  • Ok, let it be Ruby since it uses the same syntax. @Haris, if it does not work for you, please update the question. – Wiktor Stribiżew Jan 20 '16 at 14:53
  • 1
    I was initially doing /\W+/ but included an empty spot after the last word in the array. – Haris Ghauri Jan 20 '16 at 14:53
  • 1
    So the last ones become 'about', 'to', 'form', '' ] – Haris Ghauri Jan 20 '16 at 14:54
  • 1
    You can see there is a spot after form.. Where did that come from if use /\W+/ – Haris Ghauri Jan 20 '16 at 14:55
  • `\W*(?:\s|$)` adds an empty string as well // it does not even work properly – CoderPi Jan 20 '16 at 14:58
  • @Tushar I used the method you provided. It works perfectly but I want to use regular expressions. – Haris Ghauri Jan 20 '16 at 15:00
  • @CodeiSir Yup both do not work properly. The only the thing that does work is the match method provided by Tushar . But I want to do it using regular expressions. – Haris Ghauri Jan 20 '16 at 15:01
  • The method I gave also uses RegEx. If you want to use `split()` instead of `match()`, use the RegEx provided by @WiktorStribiżew or the question this is dupe of. – Tushar Jan 20 '16 at 15:03
  • 1
    @HarisGhauri: Just add `.filter(Boolean)` to the array. `console.log(words.filter(Boolean));` -> `["This", "is", "just", "a", "test", "Every", "special", "character", "including", "spaces", "should", "not", "be", "caught", "in", "the", "array", "I", "am", "about", "to", "form"]` – Wiktor Stribiżew Jan 20 '16 at 15:05
  • @WiktorStribiżew Thank you !! :) – Haris Ghauri Jan 20 '16 at 15:07

0 Answers0