1

I have this example string:

var string = 'This is a süPer NICE Sentence, am I right?';

The result has to be:

this, is, süper, nice, sentence

Requirements:

  1. 5 words max,
  2. words that contain at least 2 characters
  3. comma separated
  4. takes care of special characters such as ü this is not currently happening
  5. all in lowercase this is not currently happening

This is my current script: (you can test it in jsfiddle)

var string = 'This is a süPer NICE Sentence, am I right?';
var words;
words = string.replace(/[^a-zA-Z\s]/g,function(str){return '';});
words = words.match(/\w{2,}/g);

if(words != null) {
    //5 words maximum
    words = words.slice(0,5);
    if(words.length) {
        console.log(words.join(', ')); //should print: this, is, süper, nice, sentence
    }
}

What would be the best way to convert the matched words into lowercase before the join?

Andres SK
  • 10,779
  • 25
  • 90
  • 152

5 Answers5

1

The answer is definitely toLowerCase(), but I think the best place to run it is right at the end rather than the beginning (fewer items to operate on):

if(words != null) {
    //5 words maximum
    words = words.slice(0,5);
    if(words.length) {
        console.log(words.join(', ').toLowerCase()); //here
    }
}

toLowerCase() is, as far as I know, unicode-friendly. Your regex is stripping anything not a-z,A-Z.

Asker found this link helpful for resolving regex issue: Regular expression to match non-English characters?

Community
  • 1
  • 1
Gray
  • 7,050
  • 2
  • 29
  • 52
  • toLowerCase() is deleting special characters (ü)... I updated the example to consider that as well. – Andres SK Jul 14 '15 at 17:39
  • @andufo `words = string.replace(/[^a-zA-Z\s]/g,function(str){return '';});` is what is deleting those characters. – Gray Jul 14 '15 at 17:40
  • how could the regex consider special characters as well? – Andres SK Jul 14 '15 at 17:43
  • @andufo I'm not great with regex, maybe this: http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters Might have to exclude numbers too... not too sure. – Gray Jul 14 '15 at 17:45
  • Awesome. Glad I could help. – Gray Jul 14 '15 at 17:52
  • Just fyi this is not correct becuase `string.replace(/[^a-zA-Z\s]/g,function(str){return '';});` will leave `This is a sPer NICE Sentence am I right` **eating away** `ü` in `süPer` – anubhava Jul 14 '15 at 18:09
  • @anubhava you're right, but I used the solution available in the link for the correct conversion which ended up being the perfect regex. – Andres SK Jul 14 '15 at 22:41
  • Can you update the question with your reworked solution to benefit others. – anubhava Jul 15 '15 at 06:21
1

Just use .toLowerCase() .

var string = 'This is a süPer NICE Sentence, am I right?';
string = string.toLowerCase();
var words = string.split(' ');

//5 words maximum
words = words.slice(0,5);

console.log(words.join(', ')); //should print: this, is, super, nice, sentence

The special characters were being filtered out by the regex - if you know the words are separated by a whitespace, just use string.split(' ');

asmockler
  • 187
  • 7
  • sorry, I added a special consideration. the toLowerCase() is deleting special characters (ü)... I updated the example. – Andres SK Jul 14 '15 at 17:38
  • Just noticed that `replace(/[^a-zA-Z\s]/g,function(str){return '';})` is removing the caracter. How could I consider the ü character in that regex? – Andres SK Jul 14 '15 at 17:43
0

Just lowercase the string from the start

string.toLowerCase().replace(...
epascarello
  • 204,599
  • 20
  • 195
  • 236
0

Alternatively, you could map the word array to a lowercase string using Array#map.

console.log(words.map(function(word) { return word.toLowerCase(); }).join(', '));

Joey Robert
  • 7,336
  • 7
  • 34
  • 31
0

You can use toLowerCase method of a string to first convert the string into lower case and then do all the manipulations you need to do on the string.

eg: var string = 'This is a suPer NICE Sentence, am I right?'.toLowerCase();

sahil gupta
  • 2,339
  • 12
  • 14