3

I've got some JavaScript that looks for Amazon ASINs within an Amazon link, for example

http://www.amazon.com/dp/B00137QS28

For this I use the following regex: /([A-Z0-9]{10})

However, I don't want it to match artist links which look like:

http://www.amazon.com/Artist-Name/e/B000AQ1JZO

So I need to exclude any links where there's a '/e' before the slash and the 10-character alphanumeric code. I thought the following would do that: (?<!/e)([A-Z0-9]{10}), but it turns out negative lookbehinds don't work in JavaScript. Is that right? Is there another way to do this instead?

Any help would be much appreciated!

As a side note, be aware there are plenty of Amazon link formats, which is why I want to blacklist rather than whitelist, eg, these are all the same page:

http://www.amazon.com/gp/product/B00137QS28/
http://www.amazon.com/dp/B00137QS28
http://www.amazon.com/exec/obidos/ASIN/B00137QS28/
http://www.amazon.com/Product-Title-Goes-Here/dp/B00137QS28/
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Pete Williams
  • 221
  • 3
  • 10
  • 1
    yep, negative lookbehinds are not supported. – Felix Kling Feb 11 '12 at 00:19
  • Negative lookbehinds aren't directly supported in JS, but there are decently simple ways to implement their logic. [This question](https://stackoverflow.com/questions/641407/javascript-negative-lookbehind-equivalent) is the master question for that sort of thing. I gave [a more comprehensive answer](https://stackoverflow.com/questions/35142364/regex-negative-lookbehind-not-valid-in-javascript/35143111#35143111) elsewhere. – Adam Katz Feb 09 '16 at 02:00

3 Answers3

3

In your case an expression like this would work:

/(?!\/e)..\/([A-Z0-9]{10})/
Qtax
  • 33,241
  • 9
  • 83
  • 121
  • Nice. Maybe `(^[\s\S]?|(?!\/e)[\s\S]{2})` so that it can match near the beginning of the input and the beginning of a line. – Mike Samuel Feb 11 '12 at 18:51
2

([A-Z0-9]{10}) will work equally well on the reverse of its input, so you can

  1. reverse the string,
  2. use positive lookahead,
  3. reverse it back.
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
0

You need to use a lookahead to filter the /e/* ones out. Then trim the leading /e/ from each of the matches.

var source; // the source you're matching against the RegExp
var matches = source.match(/(?!\/e)..\/[A-Z0-9]{10}/g) || [];
var ids = matches.map(function (match) {
  return match.substr(3);
});
J. K.
  • 8,268
  • 1
  • 36
  • 35