1

I'm trying to use a regex like /[computer]{3,8}/ to get any words containing only the letters in computer ranging from 3 to 8 letters long. That regex instead captures all words containing ANY of the letters in [computer]. I've been looking at regular expression examples, but i can't quite figure it out...

How do i modifiy this regular expression to capture words containing ONLY the letters in computer (with a length of 3 to 8)?

Some examples of what i want to match using the base word 'computer' would be:

put, mop, cut, term, cute, mom, putt, mute

(the end result is to have it only use each letter once, but i can manage without that feature)

Dfowj
  • 739
  • 9
  • 25
  • I don't know Dojo, but the regex you describe will match a string of the letters in "computer", from 3 to 8 letters long. Like "uer". Clearly you're not happy with the outcome, but it's not clear what you want. – Beta May 28 '10 at 19:49
  • Any matches to a string comprised of the letters in 'computer' (3-8 in length) is in fact what i want. What i GET is 10 matches of the word 'Adenauer'... – Dfowj May 28 '10 at 20:37
  • Do you mean each letter in 'computer' can only be used one time in a correctly-matching word? If so, regex is not the way to go about this. It's possible, but brutal. – x1a4 May 28 '10 at 21:09
  • Can you show some examples of strings that should and should not be matched? – Chad Birch May 28 '10 at 21:09

3 Answers3

2

Match word boundaries at the edge of your regex.

/\b[computer]{3,8}\b/
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
0

The second part of your question ended up using quite a lot of code- it would be simpler in a oneoff, but it was more interesting to make a method to search any string of text for the components of any word you feed it.

function wordsfrom(s1, str){
    var i, tem, temp, s1= s1.toLowerCase(),
    rx= RegExp('\\b(['+s1+']{3,'+s1.length+'})\\b','gi'),
    M= str.match(rx) || [];

    return M.testEach(function(itm){
        tem= itm.toLowerCase().split('');
        temp= s1.split('');
        while(tem.length){
            ax= temp.indexAt(tem.pop());
            if(ax== -1) return false;
            temp.splice(ax, 1);
        }
        return true;
    });
}



var s1= 'cut pat, rope, computers, putt, compote, come, put, mop, dog, comute';
alert(wordsfrom('computer', s1));

/*  returned value: (Array)
cut,rope,come,put,mop,comute
*/

This uses a couple generic Array methods, useful for IE and workable in the others.

Replace with whatever you would use for an indexOf and filter method.

Array.prototype.testEach= function(fun){
    var A= [], i= 0, itm, L= this.length;
    if(typeof fun== "function"){
        while(i < L){
            itm= this[i];
            if(fun(itm, i++)) A[A.length]= itm;
        }
    }
    return A;
}
Array.prototype.indexAt= function(what){
    var L= this.length;
    while(L) if(this[--L]=== what) return L;
    return -1;
}
kennebec
  • 102,654
  • 32
  • 106
  • 127
0

I think something like this is what you want:

<script>

var s = "pu put puut mutep computer comp coomp retupmoc compux xputer";
s = s.replace(/\b((?!\w*(\w)\w*\2)[computer]{3,8})\b/g, "[$1]");
document.write(s);

</script>

This prints:

pu [put] puut [mutep] [computer] [comp] coomp [retupmoc] compux xputer

So it matches whole words that are [computer]{3,8}, but with no repeated character.

The no-repeat matching is done using negative lookahead on this pattern:

\w*(\w)\w*\2

This pattern tries to find a word that contains a repeated character. It does this by capturing a character and then seeing if it appears again later, allowing \w* in between.

See also

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623