0

I have a paragraph containing running text that may also contain URLs. The paragraph would contain running text describing an object or thing and interleaved with URLs in between. The URLs could be of the form of

  1. http://mail.google.com
  2. http://www.google.com
  3. www.google.com

I need to parse the paragraph using JavaScript and generate an HTML content, making sure that the URLs are rendered as an HTML anchor. I could use following -

var httpUrlPattern = /https?:\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/\$~+#-]*[\w@?^=%&\/~+#-])?
text = text.replace( httpUrlPattern, '<a href="$&" target="_blank">$&</a>' );

This works fine for URLs of type #1) and #2). But for #3) it generates href=/www.google.com

so I apply additional filtering

var wwwUrlPattern = /(www\.)[\w-]+(\.[\w-]+)+([\w.,@?^=%&amp;:\/\$~+#-]*[\w@?^=%&amp;\/~+#-])?;
text = text.replace( wwwUrlPattern, '<a href="http://$&" target="_blank">$&</a>' );

This fixes #3) but breaks #2).

Any suggestion how can I fix all the scenarios?

tyrion
  • 714
  • 2
  • 7
  • 27
  • Are you trying to wrap links with anchor tags? See [this answer](http://stackoverflow.com/a/32584668/3832970). – Wiktor Stribiżew Sep 22 '15 at 11:58
  • Are the URLs on separate (on their own) lines? – SamWhan Sep 22 '15 at 12:13
  • @stribizhev: I get the links as plain text, I am wrapping it in an anchor so that they are clickable. – tyrion Sep 22 '15 at 12:23
  • @ClasG, Unfortunately not. They can be anywhere in the paragraph. – tyrion Sep 22 '15 at 12:23
  • @Abhi: Then, you do not have to re-invent the wheel, try the [Autolinker.js](https://github.com/gregjacobs/Autolinker.js) library. – Wiktor Stribiżew Sep 22 '15 at 12:29
  • @stribizhev: Thank you. I went through the Autolinker.js and this would definitely solve my problem. But incorporating a 3rd party JS library at my organization needs to go through chain of management approvals. I agree with you about not to re-invent the wheel, so I think that would my last resort if nothing else works out. – tyrion Sep 22 '15 at 13:17

2 Answers2

0

Nest the groups

var wwwUrlPattern = /(http:\/\/)?((www\.)[\w-]+(\.[\w-]+)+([\w.,@?^=%&amp;:\/\$~+#-]*[\w@?^=%&amp;\/~+#-]))?/;
text = text.replace( wwwUrlPattern, '<a href="http://$2" target="_blank">$&</a>' );
4thex
  • 1,094
  • 1
  • 9
  • 21
0

The lack of criteria for how the URL is constructed make it hard. I assume you wan't to catch URLs without the www or mail prefix, like stackoverflow.com. This makes the matching very uncertain. It could be something like though:

/\b[\w.,@?^=%&:/$~+#-]+\.\w\w+\b/

but there's a huge risk of false matches.

To make it more specific you could make either the http or the www/mail part (or/and any other given set of prefixes) mandatory:

/\b((?:https?:\/\/|www\.|mail\.)[\w.,@?^=%&:/$~+#-]+)\.\w\w+\b/

Hope this helps.

Regards.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • Thanks, this expression worked for me. My apologies, I did not reply earlier. But it would not have been good etiquette not to reply, so I am replying even its few days have elapsed. Thanks again! – tyrion Oct 01 '15 at 10:31
  • @Abhi Glad to hear it worked :) Maybe accept the answer ;) – SamWhan Oct 02 '15 at 06:13