I am working on optimizing a regex (regular expression) that will match the following URL schema:
protocol://anything1/folder/index.html[?param=anything2]
where items in brackets are optional, and anything1
and anything2
can each be sequences of any characters. Everything else is static literals.
If it matters, anything1
will range in length from 36 to 48 characters, and anything2
will range in length from 5 to 40 characters. The regex does not need to validate any of this.
Importantly, both anything1
and anything2
can include forward slashes.
There are no issues if the regex requires anything1
or anything2
to be at least 1 character, as it always will be, but as performance is most important, I'm fine if it matches 0 or 1+ characters for anything1
and/or anything2
.
The regex is only used for matching, and not for parsing. Captured groups are not used elsewhere in the code.
Most importantly, I would like the regex to be as efficient (in regards to speed) as possible.
So far, I have:
^protocol://.+/folder/index\.html($|\?param=.+)
The regex must match the entire string, and not just part of it.
The regex engine is the one used internally by Firefox for its CSS engine (which I believe is the same as their JavaScript regex engine).
My regex works as expected, and I'm asking if it can be further optimized for performance.