I have several proxy rule files for Squid, and all contain rules like:
acl blacklisted dstdom_regex ^.*facebook\.* ^.*youtube\.* ^.*games.yahoo.com\.*
The patterns match against the domain name: dstdom_regex means destination (server) regular expression pattern matching.
The objective is to block some websites, but I don't know by what method: domain name, keywords in the domain name, ...
Let's expand/describe the pattern:
^.*stackexchange\.* The whole pattern
^ String beginning
.* Match anything (greedy quantifier, I presume)
stackexchange Keyword to match
\.* Any number of dots (.)
Totally legitimate matches:
stackexchange.com
: The Stack Exchange website.stackoverflow.stackexchange
: The imaginary Stack Exchange gTLD.
But these possible matches make it seem more like a keyword block:
stackexchange
stackexchanger
notstackexchange
not-stackexchange
some-website.stackexchange
some-website.stackexchange-tld
And the pattern seems to contain a bug, since it allows the following invalid cases to match, thanks to the \.*
at the end, although they never naturally occur:
stackexchange.
stackexchange...
stackexchange..........
stackexchange.......com
stackexchange.com
stackexchangecom
- you get the idea.
Anything containing stackexchange
, even if separated by dots from everything else, is still a valid match.
So now, the question itself:
This all means that this is simply a match for stackexchange
! (I'm assuming the original author didn't intend to match infinite dots.)
So why not just use the pattern stackexchange
? Wouldn't it be faster and give the same results, except for the "bug" (\.*
)?
I.e., isn't ^.*stackexchange
equivalent to stackexchange
?
Edit: Just to clarify, I didn't write those proxy rule files.