You know how if you go to facebook.com and enter a URL into the status update textarea it will automatically be detected, and Facebook will display a little snapshot of data from that URL/link? Facebook doesn't even care if you enter a URL with or without a protocol like http://
.
I'm looking to replicate this behavior. Right now I have this regular expression:
((?:https?:\/\/)?)((?:[a-zA-Z0-9\-]+\.)+(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2})(?:[a-z0-9\._\/~%\-\+&\#\?!=\(\)@]*)?(?:#?(?:[w]+)?)?)
And I use it to match URLs entered in a textarea. However, it has false positives; it'll match document.write(foo)
which clearly isn't a URL.
Facebook doesn't seem to have this issue. In fact, I can type "yahoo.com " into Facebook's textarea and it'll recognize it as a URL. But if I type "example.com " it wont recognize it. So, this means Facebook must be doing something more than just regular expression matching. Or am I wrong about this?
In conclusion, I want to know what Facebook is doing, and I want to know how I can replicate it. Any ideas, tips or solutions is very much appreciated.
Thanks for reading.