javascript remove words less than 3 characters

Question

I am tired to remove all the words less than 3 characters, like in, on ,the....

My code not work for me, Uncaught TypeError: Object ... has no method 'replace' ask for a help.

var str = 'Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo.';
var newstr = str.split(" ").replace(/(\b(\w{1,3})\b(\s|$))/g,'');
alert(newstr);

No need for the split, but \W? is needed is needed or you will not get `leo.` — mplungjan, Sep 11 '11 at 20:00
A second big problem in here and in most of the answers is the use of `\w` which does not match accented characters. — Ray Toal, Sep 11 '11 at 20:19

Jon · Accepted Answer · 2011-09-11T20:45:17.353

9

You need to change the order of split and replace:

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

Otherwise, you end up calling replace on an array, which does not have this method.

See it in action.

Note: Your current regex does not correctly handle the case where a "short" word is immediately followed by a punctuation character. You can change it slightly to do that:

/(\b(\w{1,3})\b(\W|$))/g
                ^^

Apart from that, you also have to take care of the fact that the resulting array may contain empty strings (because deleting consecutive short words separated by spaces will end up leaving consecutive spaces in the string before it's split). So you might also want to change how you split. All of this gives us:

var newstr = str.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/);

See it in action.

Update: As Ray Toal correctly points out in a comment, in JavaScript regexes \w does not match non-ASCII characters (e.g. characters with accents). This means that the above regexes will not work correctly (they will work correctly on certain other flavors of regex). Unfortunately, there is no convenient way around that and you will have to replace \w with a character group such as [a-zA-Zéǔí], and do the converse for \W.

Update:

Ugh, doing this in JavaScript regex is not easy. I came up with this regex:

([^ǔa-z\u00C0-\u017E]([ǔa-z\u00C0-\u017E]{1,3})(?=[^ǔa-z\u00C0-\u017E]|$))

...which I still don't like because I had to manually include the ǔ in there.

See it in action.

edited Sep 11 '11 at 20:45

answered Sep 11 '11 at 19:53

Jon

428,835
81
738
806

Why use split at all? Either use split and check the length of each item you get in the resulting array, or use a regex, and don't bother splitting the string at all. – GolezTrol Sep 11 '11 at 20:00
@GolezTrol: `split` by itself will produce "words" that include non-alpha characters and whose length may be more than 3 due to the presence of these characters. This is not the behavior that the OP wants. And I presume that `split` is also necessary at some point so they can loop over the results. – Jon Sep 11 '11 at 20:02
@Jon sorry to spoil your accepted answer but your fiddle does not produce the right answer! Look carefully, you have the word "dictǔ" in there. You removed the m because `\w` does not match accented letters. – Ray Toal Sep 11 '11 at 20:18
@Ray Toal, so what should I do, open the answer again? I also find something wrong... and what answer do you think is more prefect? – cj333 Sep 11 '11 at 20:28
thanks jon and ray toal, except `[a-zA-Zéǔí]`, i am glad to search via google, looking for if anyone write some function for non-english words. – cj333 Sep 11 '11 at 20:38
1

@RayToal: There's nothing to be sorry about, thank you for pointing it out (quickly scanning by eye did not allow me to catch this). Unfortunately this means that the regex will get ugly -- there's no surefire easy way to match accented letters in JS. – Jon Sep 11 '11 at 20:42
@cj333: I have updated the answer again to account for *most* accented characters (using Windows' `charmap` utility to find ranges of accented characters in Unicode). It works, but I wouldn't say it's 100% correct. You will have to do some tests to be sure. – Jon Sep 11 '11 at 20:46
1

Info on JS and Unicode here http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode. Also see http://blog.stevenlevithan.com/archives/xregexp-unicode-plugin. – Ray Toal Sep 11 '11 at 21:08

Šime Vidas · Answer 2 · 2011-09-11T20:08:47.497

4

Try this:

str = str.split( ' ' ).filter(function ( str ) {
    var word = str.match(/(\w+)/);
    return word && word[0].length > 3;
}).join( ' ' );

Live demo: http://jsfiddle.net/sTfEs/1/

edited Sep 11 '11 at 20:08

answered Sep 11 '11 at 19:56

Šime Vidas

182,163
62
281
385

Your fiddle dropped the word `"lacínia"` – Ray Toal Sep 11 '11 at 20:21

score 2 · Answer 3 · answered Sep 11 '11 at 19:57

2

var words = str.split(" "); //Turns the string into an array of words
var longWords = []; //Initialize array
for(var i = 0; i<words.length; i++){
    if(words[i].length > 3) {
        longWords.push(words[i]);
    }
}
var newString = longWords.join(" "); //Create a new string of the words separated by spaces.

answered Sep 11 '11 at 19:57

Dennis

32,200
11
64
79

set words as an away, will cost more system memory in the JavaScript processing? this is I always puzzle, and afraid to try. – cj333 Sep 11 '11 at 20:10
1

The regex will take time and memory as well. It would take testing to figure out how much each would require. Just figure out what works and use that. – Dennis Sep 11 '11 at 20:40

score 2 · Answer 4 · edited May 23 '17 at 12:02

str.split(" ") returns an array, which does not have a replace method.

Secondly, you probably don't use regexes for this. JavaScript does not have good support for non-ASCII letters in regexes. See Regular expression to match non-English characters?. If you need to use a regex, there are hints in there.

And BTW, in all regex flavors, \w{1,3} DOES NOT match "néc" As you probably know, \w is [A-Za-z_]. See http://jsfiddle.net/3YWSC/ for an example.

Are you only trying to match words of non-spaces? Or are you looking to for words of three or less letters only? On the one hand you split across spaces, but on the other you used \w. I would go with something like Dennis's answer.

score 0 · Answer 5 · answered May 11 '17 at 10:00

0

Using lodash with less then 20 chars:

let a = ['la','rivière','et','le','lapin','sont','dans','le','près'];

a = _.remove(_.uniq(a),n=>_.size(n)>3); // ['rivière','lapin','sont','dans','près']

answered May 11 '17 at 10:00

Brice

41
2

score 0 · Answer 6 · answered Oct 07 '21 at 12:30

Using The filter method

let sentence = "Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo .";

let sent = sentence.split(" ").filter((ele) => ele.length > 3).join(" ");

console.log(sent);

score 0 · Answer 7 · edited Sep 11 '11 at 19:55

0

Try

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

edited Sep 11 '11 at 19:55

evilone

22,410
7
80
107

answered Sep 11 '11 at 19:54

damon

1,087
9
12

The OP has characters in there that do not match `\w`. – Ray Toal Sep 11 '11 at 20:20

javascript remove words less than 3 characters

7 Answers7