Writing a regex that'll validate that some inputs are known link formats which I use on my site, an example would be /section/my-article-1?test=b
The requirements are
- leading slash
- the path just contains alfanumerics, dashes and slashes
- queryparams are allowed
My regex is
/^((\/)[\dA-Za-z-]+)*(\/)?([&?=\dA-Za-z-])*$/;
This kinda works but it's not optimized.
Github CodeScan shows the warning 'Polynomial regular expression' https://codeql.github.com/codeql-query-help/java/java-polynomial-redos/
I assume that's because the groups [\dA-Za-z-]
and [&?=\dA-Za-z-]
potentially could overlap and cause slowness. But I'm unsure of how to improve it while still allowing queryparams.
How would I optimize the regex?
Here's some testdata I've used
SHOULD MATCH
/
/section
/section/article-1
/section/article-1/
/section/article-1?x=y&hello=world
SHOULD NOT MATCH
section/article-1
/section/!$*
/x(1)
PS: my current regex does allow multiple slashes after eachother, which is undesirable so preventing that would also be a bonus.