If you don't want to use regular expressions, then you'll need to use things like indexOf
and such instead. For instance, search for "://" in the text of every element and if you find it and the bit in front of it looks like a protocol (or "scheme"), grab it and the following characters that are valid URI characters (RFC2396). If the result ends in a dot or question mark, remove the dot or question (it probably ends a sentence). There's not really a lot more to say.
Update: Ah, I see from your edit that you don't have a problem with regular expressions, just the ones in the answers to that question. Fair enough.
This may well be one of those places where trying to do it all with a regular expression is more work that it should be, but using regular expressions as part of the solution is helpful. For instance,
/[a-zA-Z][a-zA-Z0-9+\-.]*:\/\//
...may well be a helpful way to find the beginning of a URL, since the scheme portion must start with an alpha and then can have zero or more alpha, digit, +
, -
, or .
prior to the :
(section 3.1).