regex detect url and prepend http://

Question

I would like to detect url's that are entered in a text input. I have the following code which prepends http:// to the beginning of what has been entered:

var input = $(this);
var val = input.val();
if (val && !val.match(/^http([s]?):\/\/.*/)) {
    input.val('http://' + val);
}

How would I go about adapting this to only append the http:// if it contains a string followed by a tld? At the moment if I enter a string for example:

Hello. This is a test

the http:// will get appended to hello, even though it's not a url. Any help would be greatly appreciated.

So what is defined as a valid URL or a TLD? What happens when companies create new TLDs, such as .canon? — Qantas 94 Heavy, Oct 21 '13 at 10:11
what is "tld?" - questyString? Could you give and example to which you;d like to add http? — i100, Oct 21 '13 at 10:12
@danyo No, TLDs are domain "extensions", such as .com .eu .net etc... The **hello** part is second level domain. The **say** part in *say.hello.com* would be third level domain. — Teejay, Oct 21 '13 at 11:13

jacouh · Answer 1 · 2013-10-21T11:32:27.600

This simple function works for me. We don't care about the real existence of a TLD domain to gain speed, rather we check the syntax like example.com.

Sorry, I've forgotten that VBA trim() is not intrinsic function in js, so:

// Removes leading whitespaces
function LTrim(value)
{
    var re = /\s*((\S+\s*)*)/;
    return value.replace(re, "$1");
}

// Removes ending whitespaces
function RTrim(value)
{
    var re = /((\s*\S+)*)\s*/;
    return value.replace(re, "$1");
}

// Removes leading and ending whitespaces
function trim(value)
{
    return LTrim(RTrim(value));
}

function hasDomainTld(strAddress)
{ 
  var strUrlNow = trim(strAddress);
  if(strUrlNow.match(/[,\s]/))
  {
    return false;
  }
  var i, regex = new RegExp(); 
  regex.compile("[A-Za-z0-9\-_]+\\.[A-Za-z0-9\-_]+$"); 
  i = regex.test(strUrlNow);
  regex = null;
  return i;
}

So your code, $(this) is window object, so I pass the objInput through an argument, using classical js instead of jQuery:

function checkIt(objInput)
{
  var val = objInput.value;
  if(val.match(/http:/i)) {
    return false;
  }
  else if (hasDomainTld(val)) {
    objInput.value = 'http://' + val;
  }
}

Please test yourself: http://jsfiddle.net/SDUkZ/8/

do you have a working example of this? – danyo Oct 21 '13 at 10:22 — danyo, Oct 21 '13 at 10:22

volpav · Answer 2 · 2013-10-21T13:05:06.277

You need to narrow down your requirements first as URL detection with regular expressions can be very tricky. These are just a few situations where your parser can fail:

IDNs (госуслуги.рф)
Punycode cases (xn--blah)
New TLD being registered (.amazon)
SEO-friendly URLs (domain.com/Everything you need to know about RegEx.aspx)

We recently faced a similar problem and what we ended up doing was a simple check whether the URL starts with either http://, https://, or ftp:// and prepending with http:// if it doesn't start with any of the mentioned schemes. Here's the implementation in TypeScript:

public static EnsureAbsoluteUri(uri: string): string {
  var ret = uri || '', m = null, i = -1;
  var validSchemes = ko.utils.arrayMap(['http', 'https', 'ftp'], (i) => { return i + '://' });

  if (ret && ret.length) {
    m = ret.match(/[a-z]+:\/\//gi);

    /* Checking against a list of valid schemes and prepending with "http://" if check fails. */
    if (m == null || !m.length || (i = $.inArray(m[0].toLowerCase(), validSchemes)) < 0 ||
      (i >= 0 && ret.toLowerCase().indexOf(validSchemes[i]) != 0)) {

      ret = 'http://' + ret;
    }
  }

  return ret;
}

As you can see, we're not trying to be smart here as we can't predict every possible URL form. Furthermore, this method is usually executed against field values we know are meant to be URLs so the change of misdetection is minimal.

Hope this helps.

score 0 · Accepted Answer · answered Oct 21 '13 at 11:52

The best solution i have found is to use the following regex:

/\.[a-zA-Z]{2,3}/

This detects the . after the url, and characters for the extension with a limit of 2/3 characters.

Does this seem ok for basic validation? Please let me know if you see any problems that could arise.

I know that it will detect email address's but this wont matter in this instance.

regex detect url and prepend http://

3 Answers3