extracting link from text

Question

I am tring to extract a link from a phrase and it could be any where last, first or middle so I am usig this regex

link=text.scan(/(^| )(http.*)($| )/)

but the problem is when the link is in the middle it gets the whole phrase until the end. What should I do ?

Avinash Raj · Accepted Answer · 2015-01-12T14:17:19.667

It's because .* next to http is greedy. I suggest you to use lookarounds.

link=text.scan(/(?<!\S)(http\S+)(?!\S)/)

OR

link=text.scan(/(?<!\S)(http\S+)/)

Example:

> "http://bar.com foo http://bar.com bar http://bar.com".scan(/(?<!\S)http\S+(?!\S)/)
=> ["http://bar.com", "http://bar.com", "http://bar.com"]

DEMO

(?<!\S) Negative lookbehind which asserts that the match won't be preceeded by a non-space character.
http\S+ Matches the substring http plus the following one or more non-space characters.

score 0 · Answer 2 · answered Jan 12 '15 at 14:12

Do all the links you are trying to match follow some simple pattern? We'd need to see more context to confidently provide a good solution to your problem.

For example, the regex:

link=text.scan(/http.*\.com/)

...might be good enough for the job (this assumes all links end in ".com"), but I can't say for sure without more information.

Or again, for example, perhaps you could use something like:

link=text.scan(/http[a-z./:]*) - this assumes all links contain only lower case letters, ".", "/" and ":".

extracting link from text

2 Answers2