-2

Is there a way to return all words in a given string? The best solution I have currently found is using the match method and returning any string with at least one non-whitespace char (/\S+/g).

The issue with this method is that it includes a comma, period, etc. in the word. If I try using a RegExp with \w, then it doesn't include periods and commas, but it makes "don't" two words because of the '.

Is there any true and easy solution to this issue?

For example: "I don't want to go, mom". This should return the words [I, don't, want, to, go, mom]

kiranvj
  • 32,342
  • 7
  • 71
  • 76
user737163
  • 431
  • 3
  • 9
  • 1
    What sorts of words including commas would you want to include, that wouldn't count as two separate words instead? I suppose for periods you're thinking of abbreviations like `U.S.` or something like that? – CertainPerformance Mar 13 '21 at 16:30
  • Does this answer your question? [JavaScript break sentence by words](https://stackoverflow.com/questions/18473326/javascript-break-sentence-by-words) – Alexander Hemming Mar 13 '21 at 16:31
  • @AlexanderHemming No. All the solutions have flaws I spoke about in my post. – user737163 Mar 13 '21 at 16:32
  • Generally, you'll need to first list, how you want to treat all the edge cases. The issue with this is, that you just say "do it right", but probably didn't consider a lot of rather difficult edge-cases. Speaking of those, "edge-cases" includes a hyphen. Are those two words? One? – ASDFGerte Mar 13 '21 at 16:35
  • Please elaborate on what sorts of words with commas you want to permit. I don't think there's a good way to separate out words that end with a `.` from words that are at the end of a sentence, eg `Dr.` – CertainPerformance Mar 13 '21 at 16:35
  • What is the problem with separating by only the spaces with he split method? it will not make "don't" "don" + "'t" – Alexander Hemming Mar 13 '21 at 16:36
  • @AlexanderHemming It will count spaces as a word if you have newlines etc. – user737163 Mar 13 '21 at 16:37
  • I once did something like this, not remembering all steps. I first replaced all `,` with space, then replaced all "period followed by space" `. ` After that used `str.split(" ")` to get the words – kiranvj Mar 13 '21 at 16:40

3 Answers3

4

Would this work?

mystr.replace(".","").split(/\s/g);

I would have commented, but I don't have 50 rep

2

Use word boundaries in regex and match function

const matches = "I don't want to go, mom.".match(/(\b[^\s]+\b)/g);

console.log(matches);
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
0

I don't know if this is exactly what are you looking for but... Using just the example string you've provided this worked:

myString = "I don't want to go, mom"
wordsArray = []
myString = myString.replace(',', '')
wordsArray = myString.split(' ')
console.log({wordsArray})

But be ware that you have so much additional cases:

"two-handed" there is one or two words? ['two', 'handed'], ['two-handed'] or ['twohanded']

"Mrs. Foo", "Dr. bar"... expecting: ['Mrs', 'Foo'], ['Mrs. Foo'], ['Mrs.Foo'], ['MrsFoo'] ?

I'll appreciate any feedback.

Richard Garcia
  • 337
  • 1
  • 2
  • 7