0

Let's say I have a db url string which looks like this:

"mysql2://foo:bar@baz.com/fizz?reconnect=true"

and I came up with a regex for extracting a username, password and host name:

/\w:\/\/(\w+):/ # extracts username ("foo")
/\w:\/\/\w+:(\w+)/ # extracts password ("bar")
/\w:\/\/\w+:\w+@([\w+-\/]+)/ # extracts host name ("baz.com")

How can this regex be improved / made more efficient?

dimitry_n
  • 2,939
  • 1
  • 30
  • 53
  • 1
    it is pretty safe to say, regex runtime is not polynomial; so that is not an efficient approach for large data anyway; how about a small data like that? doesn't even matter bro – Mechanic Apr 03 '20 at 17:42
  • 1
    check out this SO question [What's the Time Complexity of Average Regex algorithms?](https://stackoverflow.com/a/5892130/5953610) – Mechanic Apr 03 '20 at 17:50
  • thanks! very good point about data size! – dimitry_n Apr 03 '20 at 18:06

1 Answers1

3

Here's a regex combining your 3 into one regex with 3 different capturing groups:

\w:\/{2}(\w+):(\w+)@(\w+\.\w+)

They seem to be pretty straightforward and fast regexes to begin with, but here's a good tool to test your regexes: https://regex101.com/. It shows you how many steps it takes to run based on your samples and the capture groups. For me it's one of the first tools I pull up when working on a new regex that isn't simple.

As for improving regexes, you want to try and make the engine perform as few steps as possible. So, quick matching and quick failure in the regex will help. For example, if it's always mysql2, you can start the regex with 2:\/{2} instead and that cuts out 10 steps based on the regex I have above.

Zachary Haber
  • 10,376
  • 1
  • 17
  • 31