-3

I'm a total regex noob and have been struggling with this problem for a while today. I have some content with urls in them. I'd like to simply extract these urls but I am having trouble selecting up to the end of the url.

I have a code sample here: https://regex101.com/r/2GfzWO/1

As you can see it doesn't select correctly, and the last url is not selected at all. Terrible :/

If anyone could steer me in the right direction I would really appreciate it

Update:

To not depend on the above link alone I thought I'd add the regex here as well. Here is the original regex for this question:

/(?:http|ftp)s?:\/\/\S*\.\S*(?="|<)/gi

Here is the content I am testing with:

Here is some content and url <p>http://www.something.com/index.html</p>
<p>Some more content <a href="http://www.something.com/some/path/here.html">http://www.something.com/some/path/here.html</a></p>
Some more text http://www.something.com/something/somethingelse.html content 
continued...

2 Answers2

0

In this case, you need a lazy match rather than a greedy one. By adding a '?' right next to the 2nd '*' in your regex, it will work.

tibetty
  • 563
  • 2
  • 10
  • Thanks @XihuaDuan I am one step closer. I also added '?' next to the first '*' which helped me better select the url in the anchor tag. I still am not able to select the last url in the content. Any idea how I can do it? Here is the modified regex: https://regex101.com/r/2GfzWO/2 – Daniel Montano Aug 05 '17 at 15:10
  • I was able to select the last url in my sample content by adding a space to my positive lookahead (" OR < OR ): `(?:http|ftp)s?:\/\/\S*?\.\S*?(?="|<| )` However, I added a url at the end, on its own line, and that won't select. I guess because it detects the end of the line. How can I select urls that are at the end of the line? Modified regex and content: https://regex101.com/r/2GfzWO/3 – Daniel Montano Aug 05 '17 at 15:25
0

Looks like I was able to solve it by adding some more alternatives to my positive look-ahead:

(?:http|ftp)s?:\/\/\S*?\.\S*?(?="|<| |\n|\r|$)

So it will detect a line break/carriage return \n \r or end of string $