4

I am tired to remove all the words less than 3 characters, like in, on ,the....

My code not work for me, Uncaught TypeError: Object ... has no method 'replace' ask for a help.

var str = 'Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo.';
var newstr = str.split(" ").replace(/(\b(\w{1,3})\b(\s|$))/g,'');
alert(newstr);
cj333
  • 2,547
  • 20
  • 67
  • 110

7 Answers7

9

You need to change the order of split and replace:

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

Otherwise, you end up calling replace on an array, which does not have this method.

See it in action.

Note: Your current regex does not correctly handle the case where a "short" word is immediately followed by a punctuation character. You can change it slightly to do that:

/(\b(\w{1,3})\b(\W|$))/g
                ^^

Apart from that, you also have to take care of the fact that the resulting array may contain empty strings (because deleting consecutive short words separated by spaces will end up leaving consecutive spaces in the string before it's split). So you might also want to change how you split. All of this gives us:

var newstr = str.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/);

See it in action.

Update: As Ray Toal correctly points out in a comment, in JavaScript regexes \w does not match non-ASCII characters (e.g. characters with accents). This means that the above regexes will not work correctly (they will work correctly on certain other flavors of regex). Unfortunately, there is no convenient way around that and you will have to replace \w with a character group such as [a-zA-Zéǔí], and do the converse for \W.

Update:

Ugh, doing this in JavaScript regex is not easy. I came up with this regex:

([^ǔa-z\u00C0-\u017E]([ǔa-z\u00C0-\u017E]{1,3})(?=[^ǔa-z\u00C0-\u017E]|$))

...which I still don't like because I had to manually include the ǔ in there.

See it in action.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • Why use split at all? Either use split and check the length of each item you get in the resulting array, or use a regex, and don't bother splitting the string at all. – GolezTrol Sep 11 '11 at 20:00
  • @GolezTrol: `split` by itself will produce "words" that include non-alpha characters and whose length may be more than 3 due to the presence of these characters. This is not the behavior that the OP wants. And I presume that `split` is also necessary at some point so they can loop over the results. – Jon Sep 11 '11 at 20:02
  • @Jon sorry to spoil your accepted answer but your fiddle does not produce the right answer! Look carefully, you have the word "dictǔ" in there. You removed the m because `\w` does not match accented letters. – Ray Toal Sep 11 '11 at 20:18
  • @Ray Toal, so what should I do, open the answer again? I also find something wrong... and what answer do you think is more prefect? – cj333 Sep 11 '11 at 20:28
  • thanks jon and ray toal, except `[a-zA-Zéǔí]`, i am glad to search via google, looking for if anyone write some function for non-english words. – cj333 Sep 11 '11 at 20:38
  • 1
    @RayToal: There's nothing to be sorry about, thank you for pointing it out (quickly scanning by eye did not allow me to catch this). Unfortunately this means that the regex will get ugly -- there's no surefire easy way to match accented letters in JS. – Jon Sep 11 '11 at 20:42
  • @cj333: I have updated the answer again to account for *most* accented characters (using Windows' `charmap` utility to find ranges of accented characters in Unicode). It works, but I wouldn't say it's 100% correct. You will have to do some tests to be sure. – Jon Sep 11 '11 at 20:46
  • 1
    Info on JS and Unicode here http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode. Also see http://blog.stevenlevithan.com/archives/xregexp-unicode-plugin. – Ray Toal Sep 11 '11 at 21:08
4

Try this:

str = str.split( ' ' ).filter(function ( str ) {
    var word = str.match(/(\w+)/);
    return word && word[0].length > 3;
}).join( ' ' );

Live demo: http://jsfiddle.net/sTfEs/1/

Šime Vidas
  • 182,163
  • 62
  • 281
  • 385
2
var words = str.split(" "); //Turns the string into an array of words
var longWords = []; //Initialize array
for(var i = 0; i<words.length; i++){
    if(words[i].length > 3) {
        longWords.push(words[i]);
    }
}
var newString = longWords.join(" "); //Create a new string of the words separated by spaces.
Dennis
  • 32,200
  • 11
  • 64
  • 79
  • set words as an away, will cost more system memory in the JavaScript processing? this is I always puzzle, and afraid to try. – cj333 Sep 11 '11 at 20:10
  • 1
    The regex will take time and memory as well. It would take testing to figure out how much each would require. Just figure out what works and use that. – Dennis Sep 11 '11 at 20:40
2

str.split(" ") returns an array, which does not have a replace method.

Secondly, you probably don't use regexes for this. JavaScript does not have good support for non-ASCII letters in regexes. See Regular expression to match non-English characters?. If you need to use a regex, there are hints in there.

And BTW, in all regex flavors, \w{1,3} DOES NOT match "néc" As you probably know, \w is [A-Za-z_]. See http://jsfiddle.net/3YWSC/ for an example.

Are you only trying to match words of non-spaces? Or are you looking to for words of three or less letters only? On the one hand you split across spaces, but on the other you used \w. I would go with something like Dennis's answer.

Community
  • 1
  • 1
Ray Toal
  • 86,166
  • 18
  • 182
  • 232
0

Using lodash with less then 20 chars:

let a = ['la','rivière','et','le','lapin','sont','dans','le','près'];

a = _.remove(_.uniq(a),n=>_.size(n)>3); // ['rivière','lapin','sont','dans','près']
Brice
  • 41
  • 2
0

Using The filter method

let sentence = "Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo .";

let sent = sentence.split(" ").filter((ele) => ele.length > 3).join(" ");

console.log(sent);
Mohamad
  • 602
  • 2
  • 5
  • 18
0

Try

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");
evilone
  • 22,410
  • 7
  • 80
  • 107
damon
  • 1,087
  • 9
  • 12