9

I was trying out to create a regular expression to match file path in java like

C:\abc\def\ghi\abc.txt

I tried this ([a-zA-Z]:)?(\\[a-zA-Z0-9_-]+)+\\? , like following code

import java.util.regex.Pattern;

  public class RETester {

public static void main(String arhs[]){

    String regularExpression = "([a-zA-Z]:)?(\\[a-zA-Z0-9_-]+)+\\?";

    String path = "D:\\directoryname\\testing\\abc.txt";

    Pattern pattern = Pattern.compile(regularExpression);

    boolean isMatched = Pattern.matches(regularExpression,path);
    System.out.println(path);
    System.out.println(pattern.pattern());
    System.out.println(isMatched);

}

}

However it's always giving me , false as result .

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
Jijoy
  • 12,386
  • 14
  • 41
  • 48

9 Answers9

17

Java is using backslash-escaping too, you know, so you need to escape your backslashes twice, once for the Java string, and once for the regexp.

"([a-zA-Z]:)?(\\\\[a-zA-Z0-9_.-]+)+\\\\?"

Your regexp matched a literal '[-zA-Z0-9_-' string, and a literal '?' at the end. I also added a period in there to allow 'abc.txt'..

That said, consider using another mechanism for determine valid file names, as there are different schemes (i.e. unix). java.util.File will probably throw an exception if the path is invalid, which might be a good alternative, although I don't like using exceptions for control flow...

falstro
  • 34,597
  • 9
  • 72
  • 86
  • Sadly no, `java.util.File` will accept gibberish paths in its constructor without throwing an exception. – finnw Dec 20 '10 at 13:57
  • Any language or system that demands qqqquuuuaaaaddddrrrruuuupppplllleeeedddd slackbashes is *ipso facto* brain-damaged beyond all hope of redemption. I would get a new job before I suffered such outrages. It’s better than going postal. – tchrist Dec 20 '10 at 21:08
  • You might need the space character: ([a-zA-Z]:)?(\\\\\[\\w\\. _-]+)+\\\\? – pikachu Apr 02 '13 at 23:35
  • your regex is wrong because it also matches this string : c:\test.txt\hhh – To Kra Oct 24 '14 at 08:48
  • 1
    @ToKra and your point is? It is a valid path so it should match. – falstro Oct 24 '14 at 08:52
  • @falstro It actually ***isn't*** a valid path. Files can only be "leaves", that is, they can't have children, so [test.txt\hhh] is impossible. --- While that is not a problem for files that already exist within the file-system, because the situation would never happen, it might be a problem if you consider the use of the pattern as a validation phase before trying to create the files and dirs described by the path. – CosmicGiant Dec 12 '14 at 00:40
  • 3
    @TheLima you are ostensibly correct, but your assumption that test.txt is a file is wrong. There's nothing keeping you from creating a directory called "test.txt" and a file in that directory called "hhh". – falstro Dec 12 '14 at 05:24
  • @falstro My apologies. I was pretty sure directories didn't allow for the dot as it was used for file-extensions, but I was wrong. Never noticed directories could have file-like names. [+1] --- Suggestions: 1) I'm not sure if the dot in [a-zA-Z0-9_.-] should be escaped, but I think it should, otherwise, it normally becomes a "match-any-char". 2) [a-zA-Z0-9_] can be replaced by [\\w]. 3) The match seems to always require a "\" at start; I think the first "\\\\" should go to the end of the capture-group, and receive the lasts' [?] --- Pretty much same suggestions as pikachu's. – CosmicGiant Dec 12 '14 at 11:12
  • What if the file path is C:/ ? It will validate to false, even though it is a valid path. – Dezso Gabos Oct 04 '18 at 11:05
4

Use this regex:

"([a-zA-Z]:)?(\\\\[a-zA-Z0-9._-]+)+\\\\?";

I added two modifications: you forgot to add . for matching the file name abc.txt and backslash escaping (\\) was also needed.

darioo
  • 46,442
  • 10
  • 75
  • 103
1

Since the path contains folders and folder name can contain any character other than

? \ / : " * < >

We can use the below regex to match a directory path [it uses all the symbols that a folder name can afford]

[A-Za-z]:[A-Za-z0-9\!\@\#\$\%\^\&\(\)\'\;\{\}\[\]\=\+\-\_\~\`\.\\]+
Cjo
  • 1,265
  • 13
  • 19
1

It does not match, because your regex match only to paths, not to files. -- More correct: it does not accept the dot in your file name.

And in addition, there is the escaping problem mentiond by roe.

Ralph
  • 118,862
  • 56
  • 287
  • 383
1

Just saying, one should replace the . in

([a-zA-Z]:)?(\\\\[a-zA-Z0-9_.-]+)+\\\\?

with \\.

. is meant for any character in a regular expression (Java style), while
\. is specifically meant for . character, and we need to escape the backslash

finbrein
  • 64
  • 3
  • `.` inside a character group never means "any character", the only special characters in character groups are `]` (end of group), `-` (range, only when special used between two characters, not at the beginning or end or between two ranges, `a-b-c` matches `a`, `b`, `c`, and `-`), and `^` (negate match, only special if used as first character) – falstro Dec 02 '15 at 13:08
0

Here is correct regex for windows filesystem:

Regular Expression:

(?:[a-zA-Z]\:)\\([\w-]+\\)*\w([\w-.])+  

as a Java string

"(?:[a-zA-Z]\\:)\\\\([\\w-]+\\\\)*\\w([\\w-.])+"
To Kra
  • 3,344
  • 3
  • 38
  • 45
0

If it has to match only the path of files lying on the same machine where your app is running, then you can use:

try{
    java.nio.file.Paths.get(yourPath);
}(catch InvalidPathException err){
}

So if you're running your app on windows the code above will catch invalid windows paths and if you're running on unix, it will catch invalid unix paths, etc.

iammyr
  • 271
  • 5
  • 13
0

There are two reasons why it is giving you false. First one is that you need \\\\ instead of \\ because you need to escape these characters. And the second one is that you're missing a dot character, you can insert it before a-z as ([a-zA-Z]:)?(\\\\[.a-zA-Z0-9_-]+)+\\\\?

Artur
  • 3,284
  • 2
  • 29
  • 35
0

A nice explanation is given here: https://www.codeproject.com/Tips/216238/Regular-Expression-to-Validate-File-Path-and-Exten

I am summarizing the same:

Regex:

^(?:[\w]\:|\\)(\\[a-z_\-\s0-9\.]+)+\.(txt|gif|pdf|doc|docx|xls|xlsx|apk)$

"^(?:[\\w]\\:|\\\\)(\\\\[a-z_\\-\\s0-9\\.]+)+\\.(txt|gif|pdf|doc|docx|xls|xlsx|apk)$"

It will work for any of these paths:

\\192.168.0.1\folder\file.pdf
\\192.168.0.1\my folder\folder.2\file.gif
c:\my folder\abc abc.docx
c:\my-folder\another_folder\abc.v2.docx
sgupta
  • 535
  • 4
  • 8
  • but not for c:my folder\abc abc.docx. and maybe it should ignore case and leading whitespace – Erik May 01 '21 at 20:52