Every time I need to use a regex I realize I've forgotten everything about them.
I am trying to match all words that have only lowercase alphanumeric characters AND do not have doubled alphanumeric characters AND are also within {10,12} characters long.
Now, to figure out if a character is followed by the same character, I would do (.)\1
. To see if a word is within 10 and 12 characters I do {10,12}
. To grab only lowercase letters and the digits, I do [0-9a-z]
.
But how do I link them together?
Cheers!
PS: this will be running on a fairly large NLP xml (100mb+), so I would appreciate it if the regex wasn't the slowest alternative.