Split text by urls

Question

I need to split a given text by urls that it might contain, while keeping the urls-separators in the resulting array.

For example splitting this text:

"An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com."

would result into this array:

["An example text that contains many links such us ", "http://www.link1.com", ", ", "https://www.link2.com/path?param=value", ", ", "www.link3.com", " and ", "link-4.com", "."]

I tried to use String.protoype.split() with a regular expression, but it's not working as it contains unwanted parts of the urls themselves:

var text = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";
console.log(text.split(/((https?:\/\/)|([\w-]{2,}[.])+([\S]{2,})[^\s|,!$\^\*;:{}`()])+/ig));

EDIT

This question is different than the suggested ones, my purpose is not to check if a url is valid or not, but to find a regular expression susceptible to be used in the split method, and that splits correctly the text.

As for splitting a text by regex, it is already used in the snippet sample. What is proposed in the suggested question is more general, and what I am looking for is more specific to urls.

@WiktorStribiżew It's working only on the links that begin with 'http://' or 'https://' — Strider, Feb 22 '20 at 22:52
@LucaKiebel That's what I am applying, but the problem is that the regular expression is not correct, and I don't get the expected result — Strider, Feb 22 '20 at 22:54
A link DOES start with http or https !!! Is this a link: abcd.ef ? NO — Poul Bak, Feb 22 '20 at 22:54
@PoulBak Sure but my objective is to detect anything in the text that might be referencing to a link, and if it does not contain a protocol, it can be prepended programmatically. — Strider, Feb 22 '20 at 23:01
URLs are *very* complex beasts, with lots of little-used options and corner cases. Be careful! — vonbrand, Feb 23 '20 at 02:30
Why is this flagged duplicate? They're completely different problem. — Laurensius Adi, Jan 22 '21 at 06:43
Since I can't answer anymore, I'll put my answer here. `text.split(/(https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[\da-zA-Z]{2,}\b([-a-zA-Z0-9@:%_+.~#?&//=,]*))/gi).filter(Boolean).filter(s => s.indexOf('/') !== 0 )` — Laurensius Adi, Jan 22 '21 at 06:44

score -1 · Answer 1 · answered Feb 22 '20 at 23:13

it's not ideal and it would be hard to find or create perfect regex for it that you going to test all cases but you can quickly write something like this:

var text2 = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";

text2
    .split(/(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/ig)
    .filter(Boolean)
    .filter((x)=>{ return x.indexOf('.')>0 })

Thanks for the answer but the `filter` method eliminates the text parts that are not links like — Strider, Feb 22 '20 at 23:19

Split text by urls

1 Answers1