0

I need to split a given text by urls that it might contain, while keeping the urls-separators in the resulting array.

For example splitting this text:

"An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com."

would result into this array:

["An example text that contains many links such us ", "http://www.link1.com", ", ", "https://www.link2.com/path?param=value", ", ", "www.link3.com", " and ", "link-4.com", "."]

I tried to use String.protoype.split() with a regular expression, but it's not working as it contains unwanted parts of the urls themselves:

var text = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";
console.log(text.split(/((https?:\/\/)|([\w-]{2,}[.])+([\S]{2,})[^\s|,!$\^\*;:{}`()])+/ig));

EDIT

This question is different than the suggested ones, my purpose is not to check if a url is valid or not, but to find a regular expression susceptible to be used in the split method, and that splits correctly the text.

As for splitting a text by regex, it is already used in the snippet sample. What is proposed in the suggested question is more general, and what I am looking for is more specific to urls.

Strider
  • 3,539
  • 5
  • 32
  • 60
  • 3
    Try `s.split(/(https?:\/\/\S*)\b/)` – Wiktor Stribiżew Feb 22 '20 at 22:47
  • @WiktorStribiżew It's working only on the links that begin with 'http://' or 'https://' – Strider Feb 22 '20 at 22:52
  • @LucaKiebel That's what I am applying, but the problem is that the regular expression is not correct, and I don't get the expected result – Strider Feb 22 '20 at 22:54
  • A link DOES start with http or https !!! Is this a link: abcd.ef ? NO – Poul Bak Feb 22 '20 at 22:54
  • @PoulBak Sure but my objective is to detect anything in the text that might be referencing to a link, and if it does not contain a protocol, it can be prepended programmatically. – Strider Feb 22 '20 at 23:01
  • URLs are *very* complex beasts, with lots of little-used options and corner cases. Be careful! – vonbrand Feb 23 '20 at 02:30
  • 2
    Why is this flagged duplicate? They're completely different problem. – Laurensius Adi Jan 22 '21 at 06:43
  • 1
    Since I can't answer anymore, I'll put my answer here. `text.split(/(https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[\da-zA-Z]{2,}\b([-a-zA-Z0-9@:%_+.~#?&//=,]*))/gi).filter(Boolean).filter(s => s.indexOf('/') !== 0 )` – Laurensius Adi Jan 22 '21 at 06:44

1 Answers1

-1

it's not ideal and it would be hard to find or create perfect regex for it that you going to test all cases but you can quickly write something like this:

var text2 = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";

text2
    .split(/(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/ig)
    .filter(Boolean)
    .filter((x)=>{ return x.indexOf('.')>0 })
Dom
  • 117
  • 1
  • 4
  • Thanks for the answer but the `filter` method eliminates the text parts that are not links like – Strider Feb 22 '20 at 23:19