20

I am trying to remove

http://localhost:7001/

part from

http://localhost:7001/www.facebook.com

to get the output as

www.facebook.com

what is the regular expression that i can use to achieve this exact pattern?

Sunil Garg
  • 14,608
  • 25
  • 132
  • 189
mdp
  • 815
  • 4
  • 17
  • 32
  • how is that url even being generated? it doesn't seem right... – lbstr Jul 18 '12 at 21:43
  • I don't know why my question got downvoted even though i don't get a perfect answer for my question yet. – mdp Jul 18 '12 at 21:56
  • @Uppi probably because you're asking for a solution while showing no effort at an attempt yourself. – sachleen Jul 18 '12 at 22:00
  • I searched on web for so much time,but i wasn't able to find a proper answer.that's why I posted here. – mdp Jul 18 '12 at 22:02
  • Maybe find the first occurance of a period, and grab the rest of the string from there plus everything before that first period and the previous / (or the beginning of the string if there isn't one...). Will your URLs be in a consistent format? That is, will they all begin with http://? –  Jul 18 '12 at 22:17
  • Yes they all begin with http:// or https://? – mdp Jul 18 '12 at 22:31

11 Answers11

36

You don't need any library or REGEX

var url = new URL('http://localhost:7001/www.facebook.com')
console.log(url.pathname)

https://developer.mozilla.org/en-US/docs/Web/API/URL

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Israel Perales
  • 2,192
  • 2
  • 25
  • 30
16

Based on @atiruz answer, but this is

url = url.replace( /^[a-zA-Z]{3,5}\:\/{2}[a-zA-Z0-9_.:-]+\//, '' );
  • shortest
  • can take https or ftp too
  • can take url with or without explicit port
  • in regular expression the first one '\' is not necessary, this is the same: `/^[a-zA-Z]{3,5}:\/{2}[a-zA-Z0-9_.:-]+\//` thaks for your code. – dgzornoza Jan 28 '23 at 18:08
12

To javascript you can use this code:

var URL = "http://localhost:7001/www.facebook.com";
var newURL = URL.replace (/^[a-z]{4,5}\:\/{2}[a-z]{1,}\:[0-9]{1,4}.(.*)/, '$1'); // http or https
alert (newURL);

Look at this code in action Here

Regards, Victor

atiruz
  • 2,782
  • 27
  • 36
  • 1
    but if there is https then var newURL = URL.replace (/^[a-z]{5}\:\/{2}[a-z]{1,}\:[0-9]{1,4}.(.*)/, '$1'); – Asad Naeem May 09 '18 at 10:24
6

This is how I made it work without resorting to regular expressions:

var URL = "http://localhost:7001/www.facebook.com";

var URLsplit = URL.split('/');

var host = URLsplit[0] + "//" + URLsplit[2] + "/";

var newURL = URL.replace(host, '');

Might not be an elegant solution though but it should be easier to understand for those who don't have much experience with regex (like me! ugh!).

DeVilFisCh
  • 267
  • 3
  • 9
  • Note that the [URL class](https://developer.mozilla.org/en-US/docs/Web/API/URL/URL) does not work in IE and is "Experimental" as of June 2017 – Bradley Flood Jun 26 '17 at 01:02
3

For a simple regex to match any protocol, domain, and (optionally) port:

var url = 'http://localhost:7001/www.facebook.com';

// Create a regex to match protocol, domain, and host
var matchProtocolDomainHost = /^.*\/\/[^\/]+:?[0-9]?\//i;

// Replace protocol, domain and host from url, assign to `myNewUrl`
var myNewUrl = url.replace(matchProtocolDomainHost, '');

Now myNewUrl === 'www.facebook.com'.

See demo on regex101

Bradley Flood
  • 10,233
  • 3
  • 46
  • 43
  • Bugs: 1) try this, it'll remove parts of the URL path: `'http://example.com/double-slash-in-url-path//oops/the-path/got-broken'.replace(/^.*\/\/[^\/]+:?[0-9]?\//i, '')` – KajMagnus Jun 06 '18 at 06:14
  • Bug 2) `[0-9]?` matches only a single digit but port numbers = 4 digits typically – KajMagnus Jun 06 '18 at 06:15
  • Another maybe more common 1) example: `'http://example.com/do-something?then-go-to=http://kittycats.com/pics'.replace(/^.*\/\/[^\/]+:?[0-9]?\//i, '')` (that's an ok url, query strings may contain http:// ) – KajMagnus Jun 06 '18 at 06:18
3

Regex to match the part of url, that you want to remove, will be something like: /^http[s]?:\/\/.+?\//

Example of Java code (note that in Java we use two backslashes "\\" for escaping character):

String urlWithBasePath = "http://localhost:7001/www.facebook.com";
String resultUrl = urlWithBasePath.replaceFirst("^http[s]?:\\/\\/.+?\\/", ""); // resultUrl => www.facebook.com

Example of JS code:

let urlWithBasePath = "http://localhost:7001/www.facebook.com";
let resultUrl = urlWithBasePath.replace(/^http[s]?:\/\/.+?\//, ''); // resultUrl => www.facebook.com

Example of Python code:

import re
urlWithBasePath = "http://localhost:7001/www.facebook.com"
resultUrl = re.sub(r'^http[s]?:\/\/.+?\/', '', urlWithBasePath) # resultUrl => www.facebook.com

Example or Ruby code:

urlWithBasePath = "http://localhost:7001/www.facebook.com"
resultUrl =  urlWithBasePath = urlWithBasePath.sub(/^http[s]?:\/\/.+?\//, '') # resultUrl => www.facebook.com

Example of PHP code:

$urlWithBasePath = "http://localhost:7001/www.facebook.com";
$resultUrl = preg_replace('/^http[s]?:\/\/.+?\//', '', $urlWithBasePath); // resultUrl => www.facebook.com

Example of C# code (you should also specify using System.Text.RegularExpressions;):

string urlWithBasePath = "http://localhost:7001/www.facebook.com";
string resultUrl = Regex.Replace(urlWithBasePath, @"^http[s]?:\/\/.+?\/", ""); // resultUrl => www.facebook.com
2

All other regular expressions here look a bit complicated? This is all that's needed: (right?)

var originSlash = /^https?:\/\/[^/]+\//i;

theUrl.replace(originSlash, '');
KajMagnus
  • 11,308
  • 15
  • 79
  • 127
1

Alternatively, you can parse the url using as3corelib's URI class. That way you don't have to do any string manipulations, which helps to avoid making unintentional assumptions. It requires a few more lines of code, but it's a more general solution that should work for a wide variety of cases:

var url : URI = new URI("http://localhost:7001/myPath?myQuery=value#myFragment");

// example of useful properties
trace(url.scheme); // prints: http
trace(url.authority); // prints the host: localhost
trace(url.port); // prints: 7001
trace(url.path); // prints: /myPath
trace(url.query); // prints: myQuery=test
trace(url.fragment); // prints: myFragment

// build a new relative url, make sure we keep the query and fragment
var relativeURL : URI = new URI();
relativeURL.path = url.path;
relativeURL.query = url.query;
relativeURL.fragment = url.fragment;

var relativeURLString : String = relativeURL.toString();

// remove first / if any
if (relativeURLString.charAt(0) == "/") {
    relativeURLString = relativeURLString.substring(1, relativeURLString.length);
}

trace(relativeURLString); // prints: myPath?myQuery=test#myFragment
Strille
  • 5,741
  • 2
  • 26
  • 40
1

instead of using regex you could just use the browser's capabilities of parsing an URL:

var parser = document.createElement('a');
parser.href = "http://localhost:7001/www.facebook.com";
var path = parser.pathname.substring(1); // --> results in 'www.facebook.com'
klues
  • 847
  • 12
  • 21
1

If you are just looking to remove the origin and get the rest of the URL, including hashes, query params and any characters without restrictions:

function getUrlFromPath(targetUrl) {
  const url = new URL(targetUrl);
  return targetUrl.replace(url.origin, '');
}

function main() {
  const testUrls = [
    'http://localhost:3000/test?search=something',
    'https://www.google.co.in/search?q=hello+there+obi+wan&newwindow=1&sxsrf=ALiCzsZoaZvs0CrLQEHFmmR-MdrZ2ZHW2A%3A1665462761920&source=hp&ei=6fFEY_7cNY36wAOFyqagBA&iflsig=AJiK0e8AAAAAY0T_-R12vR7P_tmmkpEqgzmoZNczbnZA&ved=0ahUKEwi-9buirNf6AhUNPXAKHQWlCUQQ4dUDCAc&uact=5&oq=hello+there+obi+wan&gs_lcp=Cgdnd3Mtd2l6EAMyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEMgUIABCABDIFCAAQgAQyBQgAEIAEOgQIIxAnOhEILhCABBCxAxCDARDHARDRAzoLCAAQgAQQsQMQgwE6CwguEIAEELEDEIMBOg4ILhCABBCxAxCDARDUAjoICAAQsQMQgwE6CwguEIAEELEDENQCOggIABCABBCxAzoICC4QsQMQgwFQAFjjE2C6FmgAcAB4A4AB1QSIAd8ZkgELMC45LjIuMC4yLjGYAQCgAQE&sclient=gws-wiz'
  ];
  testUrls.forEach(url => {
    console.log(getUrlFromPath(url));
  });
}

main();

A failsafe regex pattern to achieve this will get complex and cumbersome to come up with.

-6

Just use replace

"http://localhost:7001/www.facebook.com".replace("http://localhost:7001/",'')
sachleen
  • 30,730
  • 8
  • 78
  • 73
  • 1
    it works for me on my local machine not in other environments like QA,Production where URLs will be different.So, I want a regular expression pattern. – mdp Jul 18 '12 at 21:47
  • So how do you decide where to cut off the url? – sachleen Jul 18 '12 at 21:50
  • http://localhost:7001/www.facebook.com. I have to cut off the part infront of the www.facebook.com.I will cut off based on last / in the http://localhost:7001/ – mdp Jul 18 '12 at 21:53
  • 1
    how do you know where the last `/` is? Is it the last one in the entier string? meaning is it safe to assume you don't have urls like `http://localhost/www.facebook.com/test`? The hard part is not writing a regex. If you learn regex, it's quite easy. The hard part is knowing what you want. – sachleen Jul 18 '12 at 22:08