0

Am trying to work out a regex pattern which checks for the presence of a domain followed by / followed by any character. For example the string https://example.com/ is fine for me but I want to invalidate the string https://example.com/xyz as it has the domain followed by a path.

Currently I have come up with the pattern for checking a string that starts with https and followed by any charaters: https://(.*). But I have been unable to work out a pattern for the aforementioned scenario.

Thanks in advance for your inputs :)

Pavel Smirnov
  • 4,611
  • 3
  • 18
  • 28
Trooper
  • 145
  • 3
  • 15

5 Answers5

0

You should set a pattern to start with http and may end with / without any / in the middle of string

^http(s)?://[^/]*/?$
Vengleab SO
  • 716
  • 4
  • 11
  • Cool this looks good.. so I actually want to validate for string starting with https.. so would this be fine - https://[^/]*/?$ – Trooper Oct 17 '19 at 06:51
  • Ran this through a matcher and works fine for `https://example.com/` but not for `https://example.com/xyz` – Ambro-r Oct 17 '19 at 07:16
0

see RFC 3986 Appendix B (https://www.ietf.org/rfc/rfc3986.txt)

Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy" disambiguation >method used by POSIX regular expressions, it is natural and commonplace to use a regular >expression for parsing the potential five components of a URI reference.

The following line is the regular expression for breaking-down a well-formed URI >reference into its components.

 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
  12            3  4          5       6  7        8 9

The numbers in the second line above are only to assist readability; they indicate >the reference points for each subexpression (i.e., each paired parenthesis).

Forgery
  • 88
  • 6
0

I would approach this in two steps, first I would match the domain with the following regex pattern

http(s)?://(?:[\w0-9](?:[\w0-9-]{0,61}[\w0-9])?\.)+[\w0-9][\w0-9-]{0,61}[\w0-9](/)?

Once you have the domain, then I would sub-string the rest and if there is more than just a "/" (i.e. "/xyz"), then invalidate the String as per your requirement.

For example:

    String urlString = "https://example.com/";
    String regex = "http(s)?://(?:[\\w0-9](?:[\\w0-9-]{0,61}[\\w0-9])?\\.)+[\\w0-9][\\w0-9-]{0,61}[\\w0-9](/)?";
    String[] url = urlString.split(regex);
    if(url.length > 1) {
        System.out.println(urlString + " has a path.");
    } else {
        System.out.println(urlString + " does not have a path.");
    }
Ambro-r
  • 919
  • 1
  • 4
  • 14
  • Do you realize that `http[s]` will match "https" _only_ and is the same as just `https`? – Thomas Oct 17 '19 at 07:28
  • Finger trouble, should have been `(s)?` to make it capturing group between zero and one. Updated. – Ambro-r Oct 17 '19 at 07:41
  • @Thomas, though if you are really being pedantic, this expression (nor any of the expressions proposed) will not work if the `http` contains any UpperCase characters, so ideally `.toLowerCase()` should be applied to the`urlString` first. – Ambro-r Oct 17 '19 at 07:45
  • Well, that's true :) - One could make the expression case-insensitive though, e.g. by prepending `(?i)` ;) – Thomas Oct 17 '19 at 07:52
0

Here is a regex for filtering out urls that you need to invalidate.

^https?:\/\/(www\.)?([^:\/\n?]+)\/?$

Hope this helps !

Lahiru Udana
  • 105
  • 1
  • 8
-1

Please use the below regex once. This might solve your issue:

http(s?)://[[a-zA-z]+\\.*\\/
pks
  • 31
  • 6
  • `[A-z]` [matches more than just ASCII letters](https://stackoverflow.com/questions/29771901/why-is-this-regex-allowing-a-caret/29771926#29771926). – Wiktor Stribiżew Oct 17 '19 at 08:55