1

I know there are easier ways to get file extensions with JavaScript, but partly to practice my regexp skills I wanted to try and use a regular expression to split a filename into two strings, before and after the final dot (. character).

Here's what I have so far

const myRegex = /^((?:[^.]+(?:\.)*)+?)(\w+)?$/
const [filename1, extension1] = 'foo.baz.bing.bong'.match(myRegex);
// filename1 = 'foo.baz.bing.'
// extension1 = 'bong'
const [filename, extension] = 'one.two'.match(myRegex);
// filename2 = 'one.'
// extension2 = 'two'
const [filename, extension] = 'noextension'.match(myRegex);
// filename2 = 'noextension'
// extension2 = ''

I've tried to use negative lookahead to say 'only match a literal . if it's followed by a word that ends in, like so, by changing (?:\.)* to (?:\.(?=\w+.))*:

/^((?:[^.]+(?:\.(?=(\w+\.))))*)(\w+)$/gm

But I want to exclude that final period using just the regexp, and preferably have 'noextension' be matched in the initial group, how can I do that with just regexp?

Here is my regexp scratch file: https://regex101.com/r/RTPRNU/1

Peter Seliger
  • 11,747
  • 3
  • 28
  • 37
AncientSwordRage
  • 7,086
  • 19
  • 90
  • 173

4 Answers4

1

For the first capture group, you could start the match with 1 or more word characters. Then optionally repeat a . and again 1 or more word characters.

Then you can use an optional non capture group matching a . and capturing 1 or more word characters in group 2.

As the second non capture group is optional, the first repetition should be on greedy.

^(\w+(?:\.\w+)*?)(?:\.(\w+))?$

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • \w+(?:\.\w+)*? Match 1+ word characters, and optionally repeat . and 1+ word characters
  • ) Close group 1
  • (?: Non capture group to match as a whole
    • \.(\w+) Match a . and capture 1+ word chars in capture group 2
  • )? Close non capture group and make it optional
  • $ End of string

Regex demo

const regex = /^(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;
[
  "foo.baz.bing.bong",
  "one.two",
  "noextension"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
    console.log(m[2]);
    console.log("----");
  }
});

Another option as @Wiktor Stribiżew posted in the comments, is to use a non greedy dot to match any character for the filename:

^(.*?)(?:\.(\w+))?$

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

Just wanted to do a late pitch-in on this because I wanted to split up a filename into a "name" and an "extension" part - and wasn't able to find any good solutions supporting all my test cases ... and I wanted to support filenames starting with "." which should return as the "name" and I wanted to support files without any extension too.

So I'm using this line which handles all my use-cases

const [name, ext] = (filename.match(/(.+)+\.(.+)/) || ['', filename]).slice(1)

Which will give this output

'.htaccess' => ['.htaccess', undefined]
'foo' => ['foo', undefined]
'foo.png' => ['foo', 'png']
'foo.bar.png' => ['foo.bar', 'png']
'' => ['', undefined]

I find that to be what I want.

Peter Theill
  • 3,117
  • 2
  • 27
  • 29
0

If you really want to use regex, I would suggest to use two regex:

// example with 'foo.baz.bing.bong'

const firstString = /^.+(?=\.\w+)./g // match 'foo.baz.bing.' 
const secondString = /\w+$/g   // match 'bong'
0

How about something more explicit and accurate without looking around ...

  1. named groups variant ... /^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/

  2. without named groups ... /^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

Both of the just shown variants can be shortened to 2 capture groups instead of the above variant's 3 capture groups, which in my opinion makes the regex easier to work with at the cost of being less readable ...

  1. named groups variant ... /(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/

  2. without named groups ... /(\w+(?:\.\w+)*?)(?:\.(\w+))?$/

const testData = [
  'foo.baz.bing.bong',
  'one.two',
  'noextension',
];
// https://regex101.com/r/RTPRNU/5
const regXTwoNamedFileNameCaptures = /(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/;
// https://regex101.com/r/RTPRNU/4
const regXTwoFileNameCaptures = /(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;

// https://regex101.com/r/RTPRNU/3
const regXThreeNamedFileNameCaptures = /^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/
// https://regex101.com/r/RTPRNU/3
const regXThreeFileNameCaptures = /^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

console.log(
  'based on 2 named file name captures ...\n',
  testData, ' =>',
  testData.map(str =>
    regXTwoNamedFileNameCaptures.exec(str)?.groups ?? {}
  )
);
console.log(
  'based on 2 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      filename,
      extension,
    ] = str.match(regXTwoFileNameCaptures) ?? [];
  //] = regXTwoFileNameCaptures.exec(str) ?? [];

    return {
      filename,
      extension,
    }
  })
);

console.log(
  'based on 3 named file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const {
      filename = '',
      extension = '',
      noextension = '',
    } = regXThreeNamedFileNameCaptures.exec(str)?.groups ?? {};

    return {
      filename: filename || noextension,
      extension,
    }
  })
);
console.log(
  'based on 3 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      noextension = '',
      filename = '',
      extension = '',
    ] = str.match(regXThreeFileNameCaptures) ?? [];
  //] = regXThreeFileNameCaptures.exec(str) ?? [];

    return {
      filename: filename || noextension,
      extension,
    }
  })
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Peter Seliger
  • 11,747
  • 3
  • 28
  • 37