Regex for words with no doubled characters and within a string length

Question

Every time I need to use a regex I realize I've forgotten everything about them.

I am trying to match all words that have only lowercase alphanumeric characters AND do not have doubled alphanumeric characters AND are also within {10,12} characters long.

Now, to figure out if a character is followed by the same character, I would do (.)\1. To see if a word is within 10 and 12 characters I do {10,12}. To grab only lowercase letters and the digits, I do [0-9a-z].

But how do I link them together?

Cheers!

PS: this will be running on a fairly large NLP xml (100mb+), so I would appreciate it if the regex wasn't the slowest alternative.

score 3 · Accepted Answer · answered Jan 31 '13 at 19:58

3

I think this will do what you want: -

/\b(?:([a-z0-9])(?!\1)){10,12}\b/

Explanation: -

\b   // Word boundary
(?:
    ([a-z0-9])  // Match lowercase letters or digit
    (?!\1)      // Not followed by the same digit as before
){10,12}        // 10 to 12 times.
\b   // Word boundary

answered Jan 31 '13 at 19:58

Rohit Jain

209,639
45
409
525

This will match blahblahbl in blahblahblaa. – Jeff-Meadows Jan 31 '13 at 20:00
+1 - Didn't see your answer while typing mine but you did get there first. Just need to add word boundaries, which I had forgotten too. – Andrew Cheong Jan 31 '13 at 20:02
@Jeff-Meadows. Ah! Just a matter of word boundary. – Rohit Jain Jan 31 '13 at 20:02

score 2 · Answer 2 · answered Jan 31 '13 at 20:00

2

Here's one, although I'm not sure there won't be a better way...

/\b(?:([a-z0-9])(?!\1)){10,12}\b/

answered Jan 31 '13 at 20:00

Andrew Cheong

29,362
15
90
145

score 1 · Answer 3 · answered Jan 31 '13 at 20:01

Here is my attempt:

 (\b(?![0-9a-z]*([0-9a-z])\2)[0-9a-z]{10,12}\b)

(We have to use a lookahead, and some kind of boundary is usually very important for it to function properly. Hence \b).

At the time of writing, another answer has a false positive, matching a part of eoeuaoarounn

Regex for words with no doubled characters and within a string length

3 Answers3

Linked