1

Is there a known JavaScript regular expression to match an entire URL Connection String?

protocol://user:password@hostname:12345/segment1/segment2?p1=val1&p2=val2

I'm looking for a single regular expression that would help me translate such a connection string into an object:

{
    protocol: 'protocol',
    user: 'user',
    password: 'password',
    host: 'hostname:12345',
    hostname: 'hostname',
    port: 12345,
    segments: ['segment1', 'segment2'],
    params: {
        p1: 'val1',
        p2: 'val2'
    }
}

Also, I want every single part of the connection string to be optional, so the missing parameters can be filled by values from the environment.

examples:

  • protocol://
  • server:12345
  • :12345 - for the port only
  • user:password@
  • user@
  • :password@
  • /segment1
  • ?p1=val1
  • and so on...

Standard RFC 3986 rules should apply to all the parts when it comes to the valid symbols.

I'm looking for something that would work in both Node.js and all browsers.

I've done a separate parsing piece-by-piece within connection-string, but the problem with that - it doesn't allow to validate, i.e. to tell if the whole thing is valid.

vitaly-t
  • 24,279
  • 15
  • 116
  • 138
  • 1
    A dupe of [How to parse a URL?](https://stackoverflow.com/questions/6168260/how-to-parse-a-url) – Wiktor Stribiżew Jul 13 '17 at 06:53
  • @WiktorStribiżew there is no answer there that would support all parts of the URL being optional, as per my example. – vitaly-t Jul 13 '17 at 06:56
  • 1
    I don't think regex is a good idea for this problem. Why don't you just manually parse the URL and then construct the required object? – Dat Nguyen Jul 13 '17 at 07:27
  • Why do you want to use a regular expression for this case? Why not use the function, for example this one: http://locutus.io/php/url/parse_url/ ? – Sergey Khalitov Jul 13 '17 at 07:28
  • @SergeyKhalitov I don't know if it works, and if it does work with the conditions I described, it would make an answer, not a question why I don't use it - as I've never seen it before, obviously. – vitaly-t Jul 13 '17 at 07:33
  • @DatNguyen I've done it in [connection-string](https://github.com/vitaly-t/connection-string), but the problem there - it cannot parse the whole thing for validity, and I want to be able to tell if the connection string is in fact valid. – vitaly-t Jul 13 '17 at 07:35
  • This question isn't very well specified. What exactly does "and so on..." mean? In `server:12345`, is `12345` the port or the password? People are encouraged to use passwords with non-alphanumeric characters - what if the password contains `/`, `:` or `@`? Can the username contain these characters? And why does the URL contain a password anyway - is that not readable by a third party? – David Knipe Jul 13 '17 at 22:19
  • I suppose it depends on where you are in the development cycle. If this is an existing API with lots of users, then you need to find out exactly what it does, or at least what your users are doing. Is it documented? Accurately? Does the documentation promise to accept all these different strings? On the other hand, if this is a new API, decide exactly how you want it to behave, then write it down. But in this case I'd recommend changing it completely. Do you really need all these parameters to be optional? And it's unusual to have optional parameters before the `?`. – David Knipe Jul 13 '17 at 22:57

2 Answers2

8

Something like this ?

function url2obj(url) {
    var pattern = /^(?:([^:\/?#\s]+):\/{2})?(?:([^@\/?#\s]+)@)?([^\/?#\s]+)?(?:\/([^?#\s]*))?(?:[?]([^#\s]+))?\S*$/;
    var matches =  url.match(pattern);
    var params = {};
    if (matches[5] != undefined) { 
       matches[5].split('&').map(function(x){
         var a = x.split('=');
         params[a[0]]=a[1];
       });
    }

    return {
        protocol: matches[1],
        user: matches[2] != undefined ? matches[2].split(':')[0] : undefined,
        password: matches[2] != undefined ? matches[2].split(':')[1] : undefined,
        host: matches[3],
        hostname: matches[3] != undefined ? matches[3].split(/:(?=\d+$)/)[0] : undefined,
        port: matches[3] != undefined ? matches[3].split(/:(?=\d+$)/)[1] : undefined,
        segments : matches[4] != undefined ? matches[4].split('/') : undefined,
        params: params 
    };
}

console.log(url2obj("protocol://user:password@hostname:12345/segment1/segment2?p1=val1&p2=val2"));
console.log(url2obj("http://hostname"));
console.log(url2obj(":password@"));
console.log(url2obj("?p1=val1"));
console.log(url2obj("ftp://usr:pwd@[FFF::12]:345/testIP6"));

A test for the regex pattern here on regex101

LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • This is a brilliant answer, thank you! The only issue I'm having with it so far - it cannot recognize the `host/hostname` correctly when it is an IPv6, which for URL-s is specified inside square brackets, like this: `[12ab:1234::]`, can be as short as `[::]`, and as long 45 characters. – vitaly-t Jul 14 '17 at 01:00
  • Never mind, I've fixed it myself. Again, great answer, thank you! – vitaly-t Jul 14 '17 at 02:04
  • 1
    @vitaly-t Oh right, to get the hostname from the host there's a split on `:`, which would give the wrong result for an IP6 since those contain that character. I guess you figured how to extract the whole IP6 from capture group 3. Btw, I tweaked the regex a tiny bit. – LukStorms Jul 14 '17 at 08:03
  • 1
    The answer provided above proved a very valuable contribution to finalizing [connection-string](https://github.com/vitaly-t/connection-string) into a powerful module it is today :) Thank you once again! – vitaly-t Mar 26 '19 at 05:42
  • 2
    @vitaly-t Heh, that's amazing. Good job! :) Btw, it rarely happens that someone who voted on an answer would show what they did with it. So this made me smile. – LukStorms Mar 27 '19 at 08:08
0

Java datasource connection URL pattern sample if needed:

^(?:(?:(jdbc)\:{1})?(?:(\w+):/{2})?(?:([^@\/?!\"':#\s]+(?::\w+)?)@)?)?(?:([^@\/?!\"':#\s]+(?::\d+)?)(?=(?:$)|(?:/)))?(?:/([^@?!\"':#\s]*)(?=(?:$)|(?:\?)))?(?:[?]([^#?!\s]+))?\S*$

Online Demo

S.B
  • 13,077
  • 10
  • 22
  • 49
Gen Eva
  • 39
  • 6