R REGEX Match - at least 1 lowercase letter, 1 number, and no special characters at 8 length

Question

I'm trying to create a grepl regex in R to match strings that:

Contain 1 or more lowercase letters
Contain 1 or more numbers
Only allow lowercase letters (a-z) or numbers, i.e. no spaces, no special characters, no other punctuation
The string must be exactly 8 characters long

However, my attempt so far doesn't yield any luck:

grepl("((?=.*[[:lower:]])(?=.*[[:digit:]])[[:alpha:]]{8})", x, perl=TRUE)

Any ideas where I'm going wrong?

Examples of inclusion cases would be: xxxxxxx8, 1234567x, ab12ef78

Examples of exclusion cases would be: x!3d5f78, x23456789, Ab123456

score 4 · Answer 1 · answered Aug 14 '18 at 04:57

4

You're very close, you have the key concepts right (mainly forward lookahead). You could use this:

grepl("((?=.*[[:lower:]])(?=.*[[:digit:]])[[:lower:][:digit:]]{8})", x, perl=TRUE)

Personally, I don't find it much more readable to use named character classes, so I'd write it like this:

grepl("^(?=.*[a-z])(?=.*\\d)[a-z\\d]{8}$", x, perl=TRUE)

I also removed the outer parens (not necessary) and anchored the beginning & end.

Here are the results on your example inputs:

x <- c("xxxxxxx8", "1234567x", "ab12ef78", "x!3d5f78", "x23456789", "Ab123456")

grepl("^(?=.*[a-z])(?=.*\\d)[a-z\\d]{8}$", x, perl=TRUE)
# [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

answered Aug 14 '18 at 04:57

Ken Williams

22,756
10
85
147

Thank you! I'm glad that I wasn't too far off! I appreciate you taking the time! This is a useful scaffold! – SimonSchus Aug 14 '18 at 05:26
What about `grepl("^[a-z\\d]{8}$", x, perl = T)`? – s_baldur Aug 14 '18 at 08:39
1

@snoram that wouldn't ensure at least one letter and at least one number. – Ken Williams Aug 14 '18 at 15:11

score 1 · Answer 2 · answered Aug 14 '18 at 08:32

1

You could also manage with very simple regex by breaking up your test:

grepl("[a-z]", x) & # Contain 1 or more lowercase letters
  grepl("\\d", x) & # Contain 1 or more numbers
  !grepl("[A-Z]|\\s|\\p{P}|\\p{S}", x, perl = TRUE) & # no upper, space, punctuation nor special char.
  nchar(x) == 8L # is 8 characters

[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

answered Aug 14 '18 at 08:32

s_baldur

29,441
4
36
69

1

That's true, and it would allow turning the individual criteria on and off separately. It'll be slower than one big regex, though, because the regex engine can make matching extremely efficient. And it's also possible to build up regexes from chunks and paste them together, so that's an alternative halfway between single-regex and many-regex. – Ken Williams Aug 14 '18 at 15:15

R REGEX Match - at least 1 lowercase letter, 1 number, and no special characters at 8 length

2 Answers2