28

Possible Duplicates:
Regex to match URL
regex to remove the webpage part of a url in ruby

I am in search of a regular expression for parsing all the urls in a file.
i tried many of the regular expression i got after googling but it fails in one or the other case . my idea is to write one which checks the presense of http or https at the begening and it will match everything untill it sees a blank space .
any ideas ?
NOTE : i dont need to parse the url but erase all the urls from a file or atleast make it unreadable .

Community
  • 1
  • 1
Krishna Prasad Varma
  • 4,670
  • 5
  • 29
  • 41
  • P.S.: where do you see Rails here? I delete this tag. Do you know difference between Ruby and Rails? – Nakilon Jan 17 '11 at 18:37
  • Yeah, the possible duplicates questions are hardly duplicates although one could find the good answer there `URI.parse` or `URI::DEFAULT_PARSER.make_regexp`. And I don't even see a reopen vote here. – akostadinov Jan 28 '23 at 18:08
  • This one is for rails but still not rails only answers: https://stackoverflow.com/q/161738/520567 – akostadinov Jan 28 '23 at 18:08

2 Answers2

69

The standard URI library provides URI.regexp which is the regular expression for url string.

 require 'uri'
 string.scan(URI.regexp)

http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
John Dyer
  • 1,221
  • 1
  • 9
  • 13
26

You can try this:

/https?:\/\/[\S]+/

The \S means any non-whitespace character.

(Rubular)

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452