1

In my answer to nginx location deny by file extension syntax, I've contemplated that the two separate regular expressions — one consisting of a bunch of filename extensions terminated by $, and another one being a filename path like /\. — might be faster as two separate locations than a joined one with the help of the choice | metacharacter.

What's faster with nginx when ngx_regex.c is using the pcre library for regular expressions?

The two expressions run apart:

location \.(7z|bak|bz2|gz|rar|tar|zip)$ {return 403;}
location /\. {return 403;}

Or the one where the above are joined together with |:

location \.(7z|bak|bz2|gz|rar|tar|zip)$|/\. {return 403;}

Let's have an optimistic view towards people trying to break in versus legitimate site visitors — assume that most input paths would not match either of the expressions above.

Which choice would result in a faster no-match?

Are there any end-of-line optimisations that are possible with the first expression that would not take place in the combined one?

Community
  • 1
  • 1
cnst
  • 25,870
  • 6
  • 90
  • 122
  • 1
    Why do you not test them and see for yourself? – Racil Hilan Oct 10 '15 at 02:14
  • 2
    https://regex101.com/ has a nice feature called regex-debugger. You can use it to comapare different regex versions. – hek2mgl Oct 10 '15 at 02:16
  • @hek2mgl, it's completely useless for this question -- it doesn't show ANY sort of performance-based information at all – cnst Oct 10 '15 at 05:44
  • @RacilHilan, tell me how (I don't want to re-invent the wheel) – cnst Oct 10 '15 at 05:45
  • @cnst that's not true. It will show you the number of steps the regex engine will take and that can approximate for you. – d0nut Oct 10 '15 at 05:55
  • @cnst you can use regexhero.net to do benchmarking. It'll show you the number of iterations per second that could be performed with that regular expression, though, it requires silverlight (which is unfortunate) – d0nut Oct 10 '15 at 05:56
  • @iismathwizard, your first comment makes absolutely no sense (wrt the question); your second comment fails to acknowledge that i'm asking specifically about the `pcre` library -- i have no idea what my browser has, or what those web-sites use or what measurement bugs or features they have – cnst Oct 10 '15 at 06:03
  • Why is this being downvoted whereas comments that make no sense are upvoted? Also, why are people posting so many comments instead of providing an answer? – cnst Oct 10 '15 at 06:27
  • @hek2mgl, no, absolutely do not agree. the steps that web-tool shows are the very basic steps; this question is about the finer low-level optimisations around end-of-line characters and such (and might moreover likely be different between different regex implementations) – cnst Oct 10 '15 at 07:03
  • After all, I think you simply choose the wrong headline. About your question itself; it is micro optimization you are talking about. Meaning it is a non problem. Having this, I would prefer the most readable solution which is in this case the version which uses two *location* settings. – hek2mgl Oct 10 '15 at 07:04
  • I would prefer it too as I explained in my own answer referenced in this question, but I'm just curious whether or not these kinds of end-of-line optimisation that i thought could exist actually do exist, or not. – cnst Oct 10 '15 at 07:06
  • Believe me, 99% of developers in the industry don't even think about the performance of their code. The other 1% will tell you that micro optimization makes no sense, but readability and clearness makes. – hek2mgl Oct 10 '15 at 07:51
  • @hek2mgl, as I said, you don't have to convince me, I totally agree that real difference in an average application would be tiny; but, regardless, I still want to know the answer! and you won't convince me that the answer itself doesn't matter, because it does actually matter — for the general education and to satisfy the curiosity! – cnst Oct 10 '15 at 07:56
  • @hek2mgl, why did you change the title to remove the whole question? this is ridiculous, you're just changing it solely because you thought it was just another clueless newbie regex question, and the new title is completely obscure and not in any way more clear than before; if anything, this is about PCRE, not about nginx; you're now going to draw attention of a the nginx people thinking that it's some sort of a trivial nginx question, when it's still about deeper end of PCRE! (at least, thanks for the upvote replacing the downvote!) – cnst Oct 10 '15 at 08:02
  • The most important thing is the order of the individual alternatives. You would need to determine the likelihood of the occurrence of each individual pattern. Meaning if you determine that in 80% of cases an url like .bak is called, in should occur as the first alternative. I hope this is clear. I would do that, just to make the life of my little computer friend a bit easier. – hek2mgl Oct 10 '15 at 08:03
  • Man, stop to constantly complain! I did that in order to give you an up-vote. Just because I'm a nice guy. Basically the question is not worth to be asked. If you are familiar with logic you should be answer it yourself. I'm out here. – hek2mgl Oct 10 '15 at 08:05
  • @hek2mgl, yes, that's a good point, which I forgot to counter mention in the question -- the likelihood of matches would be very small, most requests would likely not result in a match – cnst Oct 10 '15 at 08:09
  • @hek2mgl, I've edited the question, hopefully to be more clear, and also to point out your observation about the frequency kind of thing – cnst Oct 10 '15 at 08:23

0 Answers0