7

You know how if you go to facebook.com and enter a URL into the status update textarea it will automatically be detected, and Facebook will display a little snapshot of data from that URL/link? Facebook doesn't even care if you enter a URL with or without a protocol like http://.

I'm looking to replicate this behavior. Right now I have this regular expression:

((?:https?:\/\/)?)((?:[a-zA-Z0-9\-]+\.)+(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2})(?:[a-z0-9\._\/~%\-\+&\#\?!=\(\)@]*)?(?:#?(?:[w]+)?)?)

And I use it to match URLs entered in a textarea. However, it has false positives; it'll match document.write(foo) which clearly isn't a URL.

Facebook doesn't seem to have this issue. In fact, I can type "yahoo.com " into Facebook's textarea and it'll recognize it as a URL. But if I type "example.com " it wont recognize it. So, this means Facebook must be doing something more than just regular expression matching. Or am I wrong about this?

In conclusion, I want to know what Facebook is doing, and I want to know how I can replicate it. Any ideas, tips or solutions is very much appreciated.

Thanks for reading.

Sam
  • 6,414
  • 11
  • 46
  • 61
  • 1
    This question appears to be off-topic because it is about the implementation details of a closed-source web service. –  Sep 15 '13 at 01:11

3 Answers3

1

the simplest of regex to match any url is

[a-z_\.\-0-9]+\.[a-z]+

if this is present, do a lookup on the result. if the result fails, then it wasnt a url.

There is no save way to tell if a url is a url if its presented to you without the http:// prefix.

the regex will match stackoverflow.com in the following string ;

I always use stackoverflow.com to find the answers i need.

if you try "http://www." & regex.match.value you should get a valid url... or not.. You wont know until you do a lookup.

Sedecimdies
  • 152
  • 1
  • 10
  • The only issue with this, and with the regex I presented in my post, is that there is a chance for false positives. Someone could type "nothing much.what are you up too?" And "much.what" would match as a URL. I guess the only way to overcome this problem would be to check if the domain is valid on the server-side. – Sam Aug 19 '13 at 18:00
  • there is no way to know in advance if a url is a url without looking it up. even http://stockoverfliw.com can fail although its a valid url format, but there is no webste present. You will get false positives, but you need to do a lookup to be sure. – Sedecimdies Aug 20 '13 at 10:42
0

facebook is using contenteditable div which detects links, at your end i would suggest you to listen to every keyup because it has multiple uses e.g after pressing @ you will see list of friends too

bitguider
  • 575
  • 1
  • 4
  • 9
0

perhaps before posting the guessed url, it does an ajax ping or something to make sure the candidate url actually is alive before presenting it?

menriquez
  • 239
  • 4
  • 17