0

I'm performing domain validation and input masking on a url string, and am trying to strip out both the protocol and the path from the url using a regex replace. Currently, I'm using two replace functions in sequence: one to replace the protocol and then another to remove the first / and everything following it.

const stripProtocol = val.replace(/(^\w+:|^)\/\//, '');
const stripPath = stripProtocol.replace(/\w\/(.*)/, '');

// http://stackoverflow.com ==> stackoverflow.com
// http://stackoverflow.com/question/ask ==> stackoverflow.co

The first regex works perfectly, but I'm running into two problems. First is that the match regex being assigned to the stripPath variable is also removing the /w character immediately preceding the first slash. Secondly, this validation is for an input field mask, meaning it get's executed on every keystroke and then replaces the user's input with the stripped down values. Therefore, I can't simply match for the first occurrence of a / in the second regex, because when the user begins typing a url that starts with a protocol, for example http://, everything after the protocol slashes will be removed. I tried a variation on the look behind alternative mentioned in this answer, but had no luck.

john_mc
  • 1,333
  • 11
  • 16
  • Hello John, please give this a try: var re = /(\w+.com)/g; var str = 'http://stackoverflow.com/question/ask'; var myArray = str.match(re); console.log(myArray[0]); Basically, that will always get the domain and TLD. – Ravi Gehlot Dec 29 '17 at 19:33
  • Ravi - it needs to work for any potential top level domain, so just checking for .com isn't sufficient. In place of what you have matching ```.com``` currently, I need to match any characters that are not a ```:``` or a ```/``` – john_mc Dec 29 '17 at 19:35
  • Please use re = /(\w+\.[a-z]{3})/g instead. That way you can match on any TLD. – Ravi Gehlot Dec 29 '17 at 19:40
  • There are many, many TLDs that are not three characters. Like I said before, it needs to match anything that is not a ```:``` or not a ```/```. No specific character counts etc. – john_mc Dec 29 '17 at 19:43

1 Answers1

1

You're very close! You can get the outcome you're looking for by using regex capturing groups.

const stripPath = stripProtocol.replace(/(\w)\/(.*)/, '$1');

Regular expressions capturing groups (indicated by parentheses) remember what they matched. With Javascript's replace, you can insert those matches.

In your example, 'http://stackoverflow.com/question/ask'.replace(/(\w)\/(.*)/, '$1') matches m/question/ask and replaces it with m.

MynockSpit
  • 449
  • 3
  • 10
  • Thank you, this is exactly what I was looking for. – john_mc Dec 29 '17 at 19:46
  • 2
    As a side note: if your URL can always be made into a valid URL, you should use the [URL constructor](https://developer.mozilla.org/en-US/docs/Web/API/URL/URL) instead of regex. – MynockSpit Dec 29 '17 at 19:52
  • 1
    Agree about using URL interface...browsers really understand url's and all the properties are simple to access – charlietfl Dec 29 '17 at 19:54
  • It did cross my mind that the browser must somehow offer an api for this, but I was too quick to assume that I could only do so for ```window.location``` – john_mc Dec 29 '17 at 19:59
  • It turns out that the URL constructor throws an error if you don't provide a protocol or a subdomain in the URL string, which are inputs that i need to support. Sticking with the regex solutions... – john_mc Dec 29 '17 at 20:26