0

I'm attempting to detect all URLs listed in a free text block. I'm using the .nets Regex.Matches call.. with the following regex: (http|https)://[^\s "']{4,}

Now, I've put in the following text:
here is a link http://somelink.com
here is a link that I didn't space withhttp://nospacelink.com/something?something=&39358235
http://nospacelink.com/something?something=&12233454
here is a link I already handled. Here is some secret t&cs you're not allowed to know https://somethingbad.com
Just to be a little annoying I've put in a new address thingy capture type of 'http://somethinginspeechmarks.com' and what are you going to do now?
here is a link http://postTextLink.com at then some post text
Here is a link with a full stop http://alinkwithafullstoplink.com. And then some more.

and I get the following output:

http://somelink.com
http://nospacelink.com?something=&39358235
http://nospacelink.com?something=&12233454
http://alreadyhandledlink.com
https://somethingbad.com
http://somethinginspeechmarks.com
http://postTextLink.com
http://alinkwithafullstoplink.com.

Please notice the full stop on the last entry. How can I update my regex to say "If there is a full stop at the end, please ignore it?"

Also, please note that "Getting parts of a URL (Regex)" has nothing to do with my question, as that question is about how to break down a particular URL. I want to extract multiple, complete urls. Please see my input and current outputs for clarification! I have got a regex already that does most of what I want, but isn't quite right. Could you please explain where my approach might be improved?

Community
  • 1
  • 1
Immortal Blue
  • 1,691
  • 13
  • 27

2 Answers2

1

I would add something like [^\.] to the pattern.

This pattern says that the last char can't be a full stop.

So for (http|https)://[^\s "']{4,}[^\.] it will try to match all adresses not ending with a full stop.

Edit:

This one should be better as said in comments: [^.\s"']

kulssaka
  • 226
  • 8
  • 27
-1

Updated:

Consider the following minor change to your pattern:

(http|https)://[^\s "']{4,}(?=\.)
halfer
  • 19,824
  • 17
  • 99
  • 186
gpmurthy
  • 2,397
  • 19
  • 21