2

For example, I want to match the following:

  • file1.js
  • file2.src.js
  • file3.bin.src.js
  • file1.binbin.js
  • file1.in.js
  • file1.b.js

But not:

  • file1.bin.js
  • file2.src.bin.js

I have the following solutions so far:

  • ^(?!.+\.bin\.js$).+\.js$ https://regex101.com/r/iR6yC9/1. The problem with this approach is .+ as well as .js$ is spelled out twice, so it feels a bit verbose and redundant.

  • ^(?:(?!\.bin\.js$).)+\.js$ https://regex101.com/r/zQ1kE0/1. The problem with the 2nd approach is that the look ahead inside the non-capturing group makes it less readable, although it does 'reuse' the .+

I feel both solutions are not ideal. I wonder if there's a good regex to solve this problem that is more readable, less redundant

Liang Zhou
  • 2,055
  • 19
  • 20
  • You need to use negative lookbehind, but Javascript doesn't have it. – Barmar Jun 17 '16 at 23:21
  • 1
    I think your 'problem' of repeating parts such as ".js" is a non-issue. Is is easier to read and probably faster than alternative expressions. – le_m Jun 17 '16 at 23:40

2 Answers2

2

If Javascript had negative lookbehind, that would be the way to do it. Since it doesn't, the most readable solution (IMHO) is to use two regular expressions.

if (/\.js$/.test(filename) && !/\.bin\.js$/.test(filename))

If you can do it in another language that has negative lookbehind, it would be:

/(?<!\.bin)\.js$/

If you read javascript regex - look behind alternative? you'll see that your original expression is one of the common workarounds for this missing feature.

Community
  • 1
  • 1
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks. Two regex is not an option for me unfortunately. What would be a good solution using negative lookbehind supposing I am not using JavaScript? – Liang Zhou Jun 17 '16 at 23:29
  • @LiangZhou Why isn't it an option? What aren't you telling us? – Dave Newton Jun 17 '16 at 23:37
  • 1
    @DaveNewton, because I don't own the code so I can't change the logic. The regex is configurable so I can change. The logic is currently taking a configurable regex and test against file names. So it has to be one regex solution. – Liang Zhou Jun 17 '16 at 23:42
0

This kind of regex is counterintuitive at first until you've had to wrangle similar ones a few times, then it makes sense but takes a bit of fiddling.

Anyway this should do the trick.

/.*(?:[^n]|[^i]n|[^b]in|[^.]bin)\.js$/

Try it out on rgex101.com

hippietrail
  • 15,848
  • 18
  • 99
  • 158
  • This should do the trick of being "more readable, less redundant"? ^^ Small issue: it matches `bla.js\n.js` - apart from that it is probably as good as it gets... – le_m Jun 18 '16 at 06:35
  • No it just looks redundant. You have filenames with literal newlines in them? Or a string with a literal backslash followed by an `n`? – hippietrail Jun 18 '16 at 09:28
  • To exclude a file named just `.js` [change the initial `.*` to `.+`](https://regex101.com/r/zG2hI7/3) - Matching across linebreaks is a kind of quirk of using rgex101.com since it seems to lack a mode where each line would be an independent string instead of just part of a large multiline string. If you are not matching in a large multiline string you won't have to worry about this. If you are matching in a multiline text then it gets even trickier and that should be part of your question since it's an important complication. – hippietrail Jun 18 '16 at 09:34
  • `'bla.js\n.js'.match(/.+(?:[^n]|[^i]n|[^b]in|[^.]bin)\.js$/gm)` returns `["bla.js\n.js"]` - it matches over two lines. – le_m Jun 18 '16 at 12:24
  • Yes because of the `/gm` flags. I still have no idea if OP needs to deal with filenames such as `'bla.js\n.js'` – hippietrail Jun 18 '16 at 15:10
  • 1
    `'bla.js\n.js'.match(/.+(?:[^n]|[^i]n|[^b]in|[^.]bin)\.js$/)` also matches `'bla.js\n.js'` - without any flag. I assume OP doesn't want to deal with such filenames since his reference regex doesn't match them. – le_m Jun 18 '16 at 16:51