The issue at hand is what is meant by "regular expression". Wikipedia has good information on this, but a simple summary is that a regular language is one defined with a few simple operations, including literal matches, alternation, and the Kleene star (match zero or more). Regex libraries have added features that don't extend this language, but make it easier to use (such as being able to say [a-z]
instead of (a|b|c|d|e|f...|z)
).
Then, along came Perl, which implemented support for regular expressions. However, instead of using the commonly used NFA/DFA implementation for regular expressions, it implemented them using backtracking. There are two consequences of this, one, it allowed things beyond regular languages to be added, such as backtracking, and two, it can be really, really slow.
Many languages used these backtracking implementations of regular expressions, but there has been a somewhat recent resurgence of removing the features from the expressions that make them difficult to implement efficiently, specifically backtracking. Go has done this, the Re2 library is a C/C++ implementation of this. And, as you've discovered the regex crate also works this way. The advantage is that it always matches in linear time.
For your particular example, what you are trying to match is indeed still a regular language, it just has to be expressed differently. Let's start with the easy part, matching the characters, but not allowing consecutive dots. Instead of thinking of it this way, think of it as matching possibly a dot between the characters, but the characters themselves aren't options. In other words, we can match with: [a-z0-9](\.?[a-z0-9])*
. We first match a single character. If you want to allow this to start with a dot, you could remove this part. Then we need zero or more occurrences of an optional dot followed by a single non-dot character. You could append a \.?
if you want to allow a dot at the end.
The second requirement, of 3-30 characters would make this regex rather complicated, because our repeated sequence is of 1 or 2 characters. I would suggest, instead, just checking the length programmatically in addition to checking the regex. You could also make a second regex that checks the length, and check that both match (Regular languages do not have an and operation).
You may also find, depending on how your are matching, you may have to anchor the match (putting a ^
at the start and a $
at the end).
A solution to the full problem:
use regex::Regex; // 1.3.5
fn main() {
let pat = Regex::new(r"^[a-z0-9](\.?[a-z0-9])*$").unwrap();
let names = &[
"valid123",
"va.li.d.12.3",
".invalid",
"invalid.",
"double..dot",
"ss",
"really.long.name.that.is.too.long",
];
for name in names {
let len = name.len();
let valid = pat.is_match(name) && len >= 3 && len <= 30;
println!("{:?}: {:?}", name, valid);
}
}