Regex to extract valid Http or Https

Question

I'm currently having some issues with a regex to extract a URL.

I want my regex to take URLS such as:

http://stackoverflow.com/questions/ask
https://stackoverflow.com
http://local:1000
https://local:1000

Through some tutorials, I've learned that this regex will find all the above: ^(http|https)\://.*$ however, it will also take http://local:1000;http://invalid http://khttp://as a single string when it shouldn't take it at all.

I understand that my expression isn't written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.

Any help is greatly appreciated!

Edit:

Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement?

Sorry this will be done with Java

I also need to add the following constraint: a string such as http://local:80/test:90 fails because of the duplicate of port...aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.

Hi, if the string contains multiple urls such as http://http://k.http://blah it shouldn't be found as valid in my regex — user2019260, Jan 28 '13 at 19:10
Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement? — user2019260, Jan 28 '13 at 19:23
Please read the [regex] tag's description: "Please also include a tag specifying the programming language or tool you are using." — JDB, Jan 28 '13 at 19:25

score 1 · Answer 1 · answered Jan 28 '13 at 19:22

1

Check your programming language to see if it already has a parser. E.g. php has parse_url()

answered Jan 28 '13 at 19:22

Greg

12,119
5
32
34

MikeM · Accepted Answer · 2013-01-28T20:02:58.507

1

This will only produce a match if if there is no :// after its first appearance in the string.

^https?:\/\/(?!.*:\/\/)\S+

Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.

In Java

Pattern reg = Pattern.compile("^https?:\\/\\/(?!.*:\\/\\/)\\S+");
Matcher m = reg.matcher("http://somesite.com"); 
if (m.find()) {
    System.out.println(m.group());
} else {
    System.out.println("No match");
}

edited Jan 28 '13 at 20:02

answered Jan 28 '13 at 19:25

MikeM

13,156
2
34
47

Seems like this is what I need, any idea how to do this in java? – user2019260 Jan 28 '13 at 19:45
@Greg. Yes, that's great, but it assumes that you have already got the url. – MikeM Jan 28 '13 at 19:55
Mike- Thank you, this works great. One question, if I wanted to add into the contraints that a second colon in the string also makes it invalid (Ex: "https://local:800/test:5") how would I go about doing that? – user2019260 Jan 28 '13 at 21:21
@user2019260. If you mean a _third_ colon, you could use `^https?:\\/\\/(?!.*:(.*:|\\/\\/))\\S+` This will disallow `://` or two `:` in the string after `http://`. – MikeM Jan 28 '13 at 22:36

score 0 · Answer 3 · answered Jan 28 '13 at 19:27

0

From http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

This may change based on the programming language/tool

answered Jan 28 '13 at 19:27

JDB

25,172
5
72
123

score 0 · Answer 4 · edited Sep 29 '22 at 15:42

0

/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g

edited Sep 29 '22 at 15:42

Suraj Rao

29,388
11
94
103

answered Sep 29 '22 at 15:40

Konstantin XFlash Stratigenas

605
6
10

Regex to extract valid Http or Https

4 Answers4