Javascript Regular Expression for non-image url

Question

In JavaScript, I want to extract a non-image url from a string e.g.

http://example.com

http://example.com/a.png

http://www.example.ccom/acd.php

http://www.example.com/b.jpg etc.

I would like to extract 1st and 3rd (non-image) URLs and ignore 2nd and 4th (image) URLs.

I tried the following which did not work

(https?:)?\/\/?[^\'"<>]+?^(\.(jpe?g|gif|png))

Which is the modification of the following Image URL Regular Expression (RE) to whom I added ^() (for not) for above snippet

(https?:)?//?[^\'"<>]+?\.(jpg|jpeg|gif|png)

Note: The RE in above examples is case-sensitive, if any clue for making RE case-insensitive

Why not try to match those images and then reject them if they match? The syntax `^()` doesn't mean 'not', it means newline, then tries to match what's inside. — Jerry, Jan 04 '14 at 09:42
an option could be to use curl to check whether the url is an image — user3096443, Jan 04 '14 at 09:52
where to negate? If you could please provide modified version of above snippet? — Muhammad Rizwan, Jan 04 '14 at 11:00

score 0 · Answer 1 · edited May 23 '17 at 10:25

0

You can use a negative lookahead like these examples It will exclude anything with the string assuming your urls are newline delimited like your example, something like this should work

(?!.*(jpg|jpeg|gif|png).*).*

EDIT: it looks like my example doesn't work, hopefully it is pointing oyu in the right direction at least

edited May 23 '17 at 10:25

Community

1
1

answered Jan 04 '14 at 09:59

actual_kangaroo

5,971
2
31
45

1

Even if it worked, it would not work on urls like `http://example.com/gifts/index` because `gif` will match in `gifts`. Ideally a negative lookbehind would be what's needed, but JS doesn't support that, – Jerry Jan 04 '14 at 10:20
Hi Eru, thanks for the answer but this did not work for me as `(https?:)?\/\/?[^\'"<>]+(?!.*(jpg|jpeg|gif|png).*).*` returns true for both either image or non-image url. May be I am missing your point that where to use your negation snippet. – Muhammad Rizwan Jan 04 '14 at 10:50

score 0 · Answer 2 · answered Jan 04 '14 at 11:40

0

first removing the images:

var tmp = text.replace(/https?:\/\/[\S]+\.(png|jpeg|jpg|gif)/gi, '');

and then matching:

var m = tmp.match(/https?:\/\/[\S]+/gi);
console.log(m);

answered Jan 04 '14 at 11:40

ZiTAL

3,466
8
35
50

Javascript Regular Expression for non-image url

2 Answers2