0

I am trying to match a URL using the following regex, in Java

^http(s*):\\/\\/.+:[1-65535]/v2/.+/component/.+$

Test fails using URL: https://box:1234/v2/something/component/a/b

I suspect it's the number range that's causing it. Help me understand what am i missing here please?

James Raitsev
  • 92,517
  • 154
  • 335
  • 470
  • Is there a reason why you do not want to allow port 0? – crackmigg Feb 14 '13 at 19:47
  • 1
    You should also be careful with the dot `.` in combination with `+` or `*` because this will be greedy matching everything that it can reach, e.g. `../../../../../`. Better would be to use `[^/]+` as a replacement for `.+` in your regex. – crackmigg Feb 14 '13 at 19:52
  • 1
    If you want it the hard way, you could use [this](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url). – crackmigg Feb 14 '13 at 19:58

3 Answers3

4

See http://www.regular-expressions.info/numericranges.html. You can't just write [1-65535] to match 1 or 65535. That says any number 1-6, or 5 or 3.

The expression you need is quite verbose, in this case:

([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])

(Credit to http://utilitymill.com/utility/Regex_For_Range)

Another issue is your http(s*). That needs to be https? because in its current form it might allow httpsssssssss://. If your regex takes public input, this is a concern.

Fredrick Brennan
  • 7,079
  • 2
  • 30
  • 61
  • 1
    Using regex to match range is ugly. Just match normally, then extract the port and compare is cleaner and more maintainable. – nhahtdh Feb 14 '13 at 19:49
  • 1
    Doesn't mean that you must provide one. You can always suggest better ways of doing it (if the better way is regex, then use regex, if better way is something else, then use that). – nhahtdh Feb 14 '13 at 19:56
2

^http(s*) is wrong, it would allow httpssssss://...

You need ^https?

This doesn't affect the given test though.

Adrián
  • 6,135
  • 1
  • 27
  • 49
1

The group [1-65535] basically means number from 1 to 6 or 5 or 5 or 3 or 5. that would even evaluate, but you need an + (or *) at the end of the group.

To match the port more precisely you could use [1-6][0-9]{0,4}?. That would get you really close, but also allow p.e. 69999 - the {m,n}? is used to specify how often a group can be used (m to n times)

Also take care of that (s*) thing the others pointed out!

That would result in: ^https?:\\/\\/.+:[1-6][0-9]{0,4}?/v2/.+/component/.+$

v01pe
  • 1,096
  • 2
  • 11
  • 19