How to extract a URL from a Tweet with a JavaScript RegEx?

Question

Assuming that I have the tweet stored as a string in a JS variable...

How to extract a URL from a tweet with a JavaScript RegEx?

This should be much easier than extracting a URL from a string, because:

I will assume that anything that starts with http or www and ends with a blank space (or end of tweet) is a URL.

You just need a JS regex that matches URLs. There are plenty of questions on SO which answer that. — Matt Ball, Jun 05 '11 at 04:21
I looked around, but I don't see any good answers. For example, these don't work: http://stackoverflow.com/questions/4043098/extract-url-from-string-with-javascript — edt, Jun 05 '11 at 16:19

arcain · Accepted Answer · 2011-06-05T15:57:02.787

Here is one of the regular expressions that I've used for pulling links from Twitter statuses.

Link Match Pattern

(?:<\w+.*?>|[^=!:'"/]|^)((?:https?://|www\.)[-\w]+(?:\.[-\w]+)*(?::\d+)?(?:/(?:(?:[~\w\+%-]|(?:[,.;@:][^\s$]))+)?)*(?:\?[\w\+%&=.;:-]+)?(?:\#[\w\-\.]*)?)(?:\p{P}|\s|<|$)

Alternatively, if you control how the statuses are fetched from Twitter, you can pass the include_entities parameter to statuses/show (or any other method that supports it, such as statuses/user_timeline) to have Twitter break out the links, mentions, and hashtags for you, like the following:

http://api.twitter.com/1/statuses/show/23918022347456512.json?include_entities=true

In the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[27,53],"url":"http:\/\/tinyurl.com\/38wp7nt"}],"hashtags":[],"user_mentions":[]}

Now, you can reference the data returned from Twitter rather than having to parse it yourself. The best things about this approach are you offload the work to Twitter, and never have to worry whether your regular expression will match with Twitter's exactly.

score 0 · Answer 2 · answered Jun 05 '11 at 04:36

0

var stringToCheck = "http://www.something";

stringToCheck.match(/^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$/); // returns true if stringToCheck is a URL

This will check for two or 3 letter TLDs and accounts for subdomains.

answered Jun 05 '11 at 04:36

citizen conn

15,300
3
58
80

The string I'm try to check is a Twitter tweet. So, stringToCheck would be something like: "Check out the awesome http://www.something.com" or "The www.something.com is awesome!" – edt Jun 05 '11 at 05:14
stringToCheck.match(/\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))/); // returns true if stringToCheck is a URL – citizen conn Jun 06 '11 at 00:14

How to extract a URL from a Tweet with a JavaScript RegEx?

2 Answers2

Linked