0

I need a regex to break a given url into two parts.

part1 --> the domain (including the protocol [http or https] if present).

part2 --> the remainder

Thus something like this:

example 1

let url = "https://www.example.com/asdasd/123asdsd/sdasd?bar=1"

regex returns

domain = https://www.example.com

remaining path = /asdasd/123asdsd/sdasd?bar=1


example 2

let url = "www.example.com/asdasd/123asdsd/sdasd?bar=1"

regex returns

domain = www.example.com

remaining path = /asdasd/123asdsd/sdasd?bar=1


example 3

let url = "example.com/asdasd/123asdsd/sdasd?bar=1"

regex returns

domain = example.com

remaining path = /asdasd/123asdsd/sdasd?bar=1


example 4

let url = "http://example.com"

regex returns

domain = http://example.com

remaining path = null

rsturim
  • 6,756
  • 15
  • 47
  • 59

3 Answers3

1

I would recommend using the URL interface instead of a regex. Although it will not handle example 2 and 3, it can pull out all the bits you require.

From MDN:

The URL interface is used to parse, construct, normalize, and encode URLs. It works by providing properties which allow you to easily read and modify the components of a URL. You normally create a new URL object by specifying the URL as a string when calling its constructor, or by providing a relative URL and a base URL. You can then easily read the parsed components of the URL or make changes to the URL.

Example for your requirements:

let url = new URL("https://www.example.com/asdasd/123asdsd/sdasd?bar=1");

console.log("domain - " + url.origin);
console.log("remaining path - " + url.pathname + url.search);
frobinsonj
  • 1,109
  • 9
  • 21
1

Use URL.

var url = new URL("https://www.example.com/asdasd/123asdsd/sdasd?bar=1");
var domain = `${url.protocol}//${url.host}`;
var path = `${url.pathname}?${url.searchParams.toString()}`;
console.log(`domain = ${domain}`)
console.log(`remaining path = ${path}`)

Someone beat me to the punch with URL so I'll post the regex as well.

var url = "https://www.example.com/asdasd/123asdsd/sdasd?bar=1";
var matches = /(https?:\/\/.*?)([/$].*)/.exec(url);
var domain = matches[1];
var path = matches[2];
console.log(`domain = ${domain}`)
console.log(`remaining path = ${path}`)
I wrestled a bear once.
  • 22,983
  • 19
  • 69
  • 116
0

Here is the breakdown javascript version. Hope this helps understand

//removes protocol
let regEx = /^(?:www\.)?(.*?):\/\//gim;
let url = "https://www.example.com/asdasd/123asdsd/sdasd?bar=1"
let path = url.replace(regEx, "");
console.log("path = " + path);

//removes domain extracts route
let regEx2 = /^(.*?\/)/;
if (path.match(regEx2)) {
  let route = "/" + path.replace(regEx2, "");
  console.log("route", route);

  //extracts domain
  url = path.match(regEx2);
  let domainUrl = url[0].replace("/", "");
  console.log("domainUrl = ", domainUrl);
}
Saqib
  • 371
  • 2
  • 13