40

I need to extract just the filename (no file extension) from the following path....

\\my-local-server\path\to\this_file may_contain-any&character.pdf

I've tried several things, most based off of something like http://regexr.com?302m5 but can't quite get there

Ben
  • 60,438
  • 111
  • 314
  • 488
  • 3
    Which language? Some languages have a method to parse URIs in their standard library. – Felix Kling Feb 20 '12 at 14:59
  • 2
    I'm skeptical a regex would be faster than getting the index of the last path separator, but I could be wrong. – Dave Newton Jul 15 '14 at 23:18
  • This question is vague as it only contains one example of path and filename structure. Regex is used to match and/or capture different structures which have some similarity. – Pan Jan 11 '20 at 20:00

20 Answers20

39
^\\(.+\\)*(.+)\.(.+)$

This regex has been tested on these two examples:

\var\www\www.example.com\index.php
\index.php

First block "(.+\)*" matches directory path.
Second block "(.+)" matches file name without extension.
Third block "(.+)$" matches extension.

  • 1
    This is a general approach with some issues. This fails on filenames without an extension, which are not uncommon on *NIX systems. Additionally, the question indicated double leading backslash, so I'd possibly add another escaped backslash outside the capture groups. No mention was made of capturing the path or the extension, so it could be simplified. – Pan Jan 11 '20 at 19:28
  • Also fails if the file has no path. – Pan Jan 11 '20 at 19:34
20

This will get the filename but will also get the dot. You might want to truncate the last digit from it in your code.

[\w-]+\.

Update

@Geoman if you have spaces in file name then use the modified pattern below

[ \w-]+\.      (space added in brackets)

Demo

TheTechGuy
  • 16,560
  • 16
  • 115
  • 136
17

This is just a slight variation on @hmd's so you don't have to truncate the .

[ \w-]+?(?=\.)

Demo

Really, thanks goes to @hmd. I've only slightly improved on it.

campeterson
  • 3,591
  • 2
  • 25
  • 26
  • 1
    Fails if path contains a dot or if filename does not have an extension or if there is no path. – Pan Jan 11 '20 at 19:33
9

I use @"[^\\]+$" That gives the filename including the extension.

user890332
  • 1,315
  • 15
  • 15
  • 1
    I can't believe this was answered 3 hrs ago, thanks a lot! I had to extract the file name with no extension and therefore no dot in the end from S3 resource path. Just had to replace \\ with \/ to work with S3 path, works like a charm! – GSazheniuk Apr 29 '20 at 20:29
  • 2
    Is this correct? Shouldn't be `[^\/]+$`? – FabianoLothor Oct 15 '22 at 16:11
  • 2
    @fabianoLothor - If the url has forward slashes then it's your way. If it's backslashes then my way. The question was backslashes. – user890332 Oct 25 '22 at 20:32
9

Try this:

[^\\]+(?=\.pdf$)

It matches everything except back-slash followed by .pdf at the end of the string.

You can also (and maybe it's even better) take the part you want into the capturing group like that:

([^\\]+)\.pdf$

But how you refer to this group (the part in parenthesis) depends on the language or regexp flavor you're using. In most cases it'll be smth like $1, or \1, or the library will provide some method for getting capturing group by its number after regexp match.

KL-7
  • 46,000
  • 9
  • 87
  • 74
4

I'm using this regex to replace the filename of the file with index. It matches a contiguous string of characters that doesn't contain a slash and is followed by a . and a string of word characters at the end of the string. It will retrieve the filename including spaces and dots but will ignore the full file extension.

const regex = /[^\\/]+?(?=\.\w+$)/

console.log('/path/to/file.png'.match(regex))
console.log('/path/to/video.webm'.match(regex))
console.log('/path/to/weird.file.gif'.match(regex))
console.log('/path with/spaces/and file.with.spaces'.match(regex))
James Coyle
  • 9,922
  • 1
  • 40
  • 48
3

If anyone is looking for a windows absolute path (and relative path) javascript regular expression in javascript for files:

var path = "c:\\my-long\\path_directory\\file.html";


((/(\w?\:?\\?[\w\-_\\]*\\+)([\w-_]+)(\.[\w-_]+)/gi).exec(path);

Output is:

[
"c:\my-long\path_directory\file.html", 
"c:\my-long\path_directory\", 
"file", 
".html"
]
Angelo
  • 1,407
  • 1
  • 13
  • 18
3

TEST ^(.*[\\\/])?(.*?)(\.[^.]*?|)$

example:

/^(.*[\\\/])?(.*?)(\.[^.]*?|)$/.exec("C:\\folder1\\folder2\\foo.ext1.ext")

result:

0: "C:\folder1\folder2\foo.ext1.ext"
1: "C:\folder1\folder2\"
2: "foo.ext1"
3: ".ext"

the $1 capture group is the folder
the $2 capture group is the name without extension
the $3 capture group is the extension (only the last)

works for:

  • C:\folder1\folder2\foo.ext
  • C:\folder1\folder2\foo.ext1.ext
  • C:\folder1\folder2\name-without extension
  • only name
  • name.ext
  • C:\folder1\folder2\foo.ext
  • /folder1/folder2/foo.ext
  • C:\folder1\folder2\foo
  • C:\folder1\folder2\
  • C:\special&chars\folder2\f [oo].ext1.e-x-t
harold
  • 41
  • 1
  • 1
    Test cases are good too. – Nor.Z Dec 22 '22 at 16:08
  • Seems quite robust. However, if the filename has a leading period as part of its name, this regex treats the full name as a file extension with no name. These types of files exist, such as `.Rprofile` or `.gitignore`. I am not sure whether they should be treated as file extensions or filenames though - I am leaning towards the latter. If the second capture group is changed to `(\.?.*?)` it seems to work on all your test cases. Full regex: `^(.*[\\\/])?(\.?.*?)(\.[^.]*?|)$` – Therkel Jul 11 '23 at 08:45
  • Extra test cases: `C:\folder1\folder2\.name-without extension`, `.only name leading period` `.name.ext`. Also, your regex also already handles periods in the directories correctly. See `C:\fol.d.er1\fo.lde.r2/.name- with extension.R` – Therkel Jul 11 '23 at 08:48
2

Here's a slight modification to Angelo's excellent answer that allows for spaces in the path, filename and extension as well as missing parts:

function parsePath (path) {
    var parts = (/(\w?\:?\\?[\w\-_ \\]*\\+)?([\w-_ ]+)?(\.[\w-_ ]+)?/gi).exec(path);
    return {
        path: parts[0] || "",
        folder: parts[1] || "",
        name: parts[2] || "",
        extension: parts[3] || "",
    };
}
moomoo
  • 826
  • 10
  • 9
2

Answer with:

  • File name and directory space support
  • Named capture group
  • Gets unlimited file extensions (captures file.tar.gz, not just file.tar)
  • *NIX and Win support

^.+(\\|\/)(?<file_name>([^\\\/\n]+)(\.)?[^\n\.]+)$

Explanation:

  1. ^.+(\\|\/) Gets anything up to the final / or \ in a file path
  2. (?<file_name> Begin named capture group
  3. ([^\\\/\n]+) get anything except for a newline or new file
  4. (\.)?[^\n\.]+ Not really needed but it works well for issues with odd characters in file names
  5. )$ End named capture group and end line

Note that if you're putting this in a string and you need to escape backslashes (such as with C) you'll be using this string:

"^.+(\\\\|\/)(?<file_name>([^\\\/\n]+)(\.)?[^\n\.]+)$"

Arcsector
  • 1,153
  • 9
  • 14
2

If you want to return the file name with its extension, Regex should be as below:

[A-Za-z0-9_\-\.]+\.[A-Za-z0-9]+$

works for

path/to/your/filename.some
path/to/your/filename.some.other
path\to\your\filename.some
path\to\your\filename.some.other
http://path/to/your/filename.some
http://path/to/your/filename.some.other
And so on

Which returns full file name with extension(eg: filename.some or filename.some.other)


If you want to return file name without the last extension Regex should be as below:

[A-Za-z0-9_\-\.]+(?=\.[A-Za-z0-9]+$)

Which returns full file name without last extension(eg: "filename" for "filename.some" and "filename.some" for "filename.some.other")

Jafar Amini
  • 315
  • 2
  • 8
1

Click the Explain button on these links shown TEST to see how they work.


This is specific to the pdf extension.

TEST ^.+\\([^.]+)\.pdf$


This is specific to any extension, not just pdf.

TEST ^.+\\([^.]+)\.[^\.]+$


([^.]+) This is the $1 capture group to extract the filename without the extension.


\\my-local-server\path\to\this_file may_contain-any&character.pdf

will return

this_file may_contain-any&character

Ste
  • 1,729
  • 1
  • 17
  • 27
  • Both fail on filenames with multiple dots and on filenames without an extension. – Pan Jan 11 '20 at 19:02
  • And on files without a path. – Pan Jan 11 '20 at 19:49
  • Quit trolling along all the comments. This answers the OPs question. – Ste Jan 11 '20 at 22:56
  • You are correct. I'm sorry if I offended you or anyone else. Many answers here solve the OPs question. My problem is actually with the question and not with the answers. Most of the answers are probably correct according to the question, which is vague. I am new and I should have acted differently. Sorry! – Pan Jan 11 '20 at 23:20
1

try this

[^\\]+$

you can also add extension for specificity

[^\\]+pdf$

0

Here is an alternative that works on windows/unix:

"^(([A-Z]:)?[\.]?[\\{1,2}/]?.*[\\{1,2}/])*(.+)\.(.+)"

First block: path
Second block: dummy
Third block: file name
Fourth block: extension

Tested on:

".\var\www\www.example.com\index.php"
"\var\www\www.example.com\index.php"
"/var/www/www.example.com/index.php"
"./var/www/www.example.com/index.php"
"C:/var/www/www.example.com/index.php"
"D:/var/www/www.example.com/index.php"
"D:\\var\\www\\www.example.com\\index.php"
"\index.php"
"./index.php"
cfcm
  • 139
  • 2
  • 10
0

This regular expression extract the file extension, if group 3 isn't null it's the extension.

.*\\(.*\.(.+)|.*$)
Simon
  • 2,686
  • 2
  • 31
  • 43
0

also one more for file in dir and root

   ^(.*\\)?(.*)(\..*)$

for file in dir

Full match  0-17    `\path\to\file.ext`
Group 1.    0-9 `\path\to\`
Group 2.    9-13    `file`
Group 3.    13-17   `.ext`

for file in root

Full match  0-8 `file.ext`
Group 2.    0-4 `file`
Group 3.    4-8 `.ext`
jhenya-d
  • 399
  • 7
  • 19
0

For most of the cases ( that is some win , unx path , separator , bare file name , dot , file extension ) the following one is enough:

 // grap the dir part (1), the dir sep(2) , the bare file name (3) 
 path.replaceAll("""^(.*)[\\|\/](.*)([.]{1}.*)""","$3")
Yordan Georgiev
  • 5,114
  • 1
  • 56
  • 53
0

Direct approach:

To answer your question as it's written, this will provide the most exact match:

^\\\\my-local-server\\path\\to\\(.+)\.pdf$

General approach:

This regex is short and simple, matches any filename in any folder (with or without extension) on both windows and *NIX:

.*[\\/]([^.]+)

If a file has multiple dots in its name, the above regex will capture the filename up to the first dot. This can easily be modified to match until the last dot if you know that you will not have files without extensions or that you will not have a path with dots in it.

If you know that the folder will only contain .pdf files or you are only interested in .pdf files and also know that the extension will never be misspelled, I would use this regex:

.*[\\/](.+)\.pdf$

Explanation:

  • . matches anything except line terminators.
  • * repeats the previous match from zero to as many times as possible.
  • [\\/] matches a the last backslash or forward slash (previous ones are consumed by .*). It is possible to omit either the backslash or the forward slash if you know that only one type of environment will be used. If you want to capture the path, surround .* or .*[\\/] in parenthesis.
  • Parenthesis will capture what is matched inside them.
  • [^.] matches anything that is not a literal dot.
  • + repeats the previous match one or more times, as many as possible.
  • \. matches a literal dot.
  • pdf matches the string pdf.
  • $ asserts the end of the string.

If you want to match files with zero, one or multiple dots in their names placed in a variable path which also may contain dots, it will start to get ugly. I have not provided an answer for this scenario as I think it is unlikely.

Edit: To also capture filenames without a path, replace the first part with (?:.*[\\/])?, which is an optional non-capturing group.

Pan
  • 331
  • 1
  • 7
0

Does this work...

.*\/(.+)$

Posting here so I can get feedback

Noobie
  • 461
  • 1
  • 12
  • 34
0

Here a solution to extract the file name without the dot of the extension. I begin with the answer from @Hammad Khan and add the dot in the search character. So, dots can be part of the file name:

[ \w-.]+\.

Then use the regex look ahead(?= ) for a dot, so it will stop the search at the last dot (the dot before the extension), and the dot will not appears in the result:

[ \w-.]+(?=[.])

reorder, it's not necessary but look better:

[\w-. ]+(?=[.])
yoh
  • 154
  • 1
  • 10