1

In JavaScript, I want to extract a non-image url from a string e.g.

http://example.com

http://example.com/a.png

http://www.example.ccom/acd.php

http://www.example.com/b.jpg etc.

I would like to extract 1st and 3rd (non-image) URLs and ignore 2nd and 4th (image) URLs.

I tried the following which did not work

(https?:)?\/\/?[^\'"<>]+?^(\.(jpe?g|gif|png))

Which is the modification of the following Image URL Regular Expression (RE) to whom I added ^() (for not) for above snippet

(https?:)?//?[^\'"<>]+?\.(jpg|jpeg|gif|png)

Note: The RE in above examples is case-sensitive, if any clue for making RE case-insensitive

Community
  • 1
  • 1
Muhammad Rizwan
  • 348
  • 2
  • 12

2 Answers2

0

You can use a negative lookahead like these examples It will exclude anything with the string assuming your urls are newline delimited like your example, something like this should work

(?!.*(jpg|jpeg|gif|png).*).*

EDIT: it looks like my example doesn't work, hopefully it is pointing oyu in the right direction at least

Community
  • 1
  • 1
actual_kangaroo
  • 5,971
  • 2
  • 31
  • 45
  • 1
    Even if it worked, it would not work on urls like `http://example.com/gifts/index` because `gif` will match in `gifts`. Ideally a negative lookbehind would be what's needed, but JS doesn't support that, – Jerry Jan 04 '14 at 10:20
  • Hi Eru, thanks for the answer but this did not work for me as `(https?:)?\/\/?[^\'"<>]+(?!.*(jpg|jpeg|gif|png).*).*` returns true for both either image or non-image url. May be I am missing your point that where to use your negation snippet. – Muhammad Rizwan Jan 04 '14 at 10:50
0

first removing the images:

var tmp = text.replace(/https?:\/\/[\S]+\.(png|jpeg|jpg|gif)/gi, '');

and then matching:

var m = tmp.match(/https?:\/\/[\S]+/gi);
console.log(m);
ZiTAL
  • 3,466
  • 8
  • 35
  • 50