-1

I am tring to extract a link from a phrase and it could be any where last, first or middle so I am usig this regex

link=text.scan(/(^| )(http.*)($| )/)

but the problem is when the link is in the middle it gets the whole phrase until the end. What should I do ?

Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
Safouen
  • 121
  • 2
  • 10

2 Answers2

1

It's because .* next to http is greedy. I suggest you to use lookarounds.

link=text.scan(/(?<!\S)(http\S+)(?!\S)/)

OR

link=text.scan(/(?<!\S)(http\S+)/)

Example:

> "http://bar.com foo http://bar.com bar http://bar.com".scan(/(?<!\S)http\S+(?!\S)/)
=> ["http://bar.com", "http://bar.com", "http://bar.com"]

DEMO

  • (?<!\S) Negative lookbehind which asserts that the match won't be preceeded by a non-space character.

  • http\S+ Matches the substring http plus the following one or more non-space characters.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Do all the links you are trying to match follow some simple pattern? We'd need to see more context to confidently provide a good solution to your problem.

For example, the regex:

link=text.scan(/http.*\.com/)

...might be good enough for the job (this assumes all links end in ".com"), but I can't say for sure without more information.

Or again, for example, perhaps you could use something like:

link=text.scan(/http[a-z./:]*) - this assumes all links contain only lower case letters, ".", "/" and ":".

Tom Lord
  • 27,404
  • 4
  • 50
  • 77