If you need find urls in a text, you don't need to fit the RFC (whatever the number) it's totally useless (and it's nearly impossible with a pattern that follows the standard, it will be too slow, too complex).
All urls in the text should be considered valid (and / or must be validated or not before being inserted in the text by the people who produce this text. In other words, it is not your job!).
So, you must find an other approach. To do this, you must ask the right question: how to distinguish a URL from the text?
Let's list the common criteria
- a URL may begin with the protocol: http, https, ftp, sftp, ftps, gopher, ...
- a URL may begin with
www.
- a URL does not contain whitespace characters
- a URL begins always with a word boundary
- a URL may ends before a whitespace character, the end of the string, a punctuation character except the question mark (that can be present even if there are no GET parameters)
With these requirements, you can build easily a naive pattern for the http protocol:
\b(https?://|www\.)\S+(?=\s|[^\P{P}?]|\z)
Note that once you obtain a result, you are free to check the validity of the url with a build-in function (which generally doesn't handle all the cases however, but now you know why:).