1

I have a simple problem that I am quite struggling with. I have several files in a directory and I am reading them and passing processing them based on their type (extension). However, as an input, I receive a path to the file without extension so I have to identify the type myself.

example (files):

files/file1.txt
files/file1.txt
files/pic1.jpg
----------------
String path = "files/file1";
String ext = FilenameUtils.getExtension(path); // this returns null

Is there a way to identify the type of file when the extension is not included in the path?

Smajl
  • 7,555
  • 29
  • 108
  • 179
  • 1
    You can check the MIMEType of file – Deepika Rajani Apr 14 '15 at 13:13
  • MIMEtype or [This](http://commons.apache.org/proper/commons-io/javadocs/api-1.4/org/apache/commons/io/FilenameUtils.html#getExtension%28java.lang.String%29) `getExtension` – Anjula Ranasinghe Apr 14 '15 at 13:14
  • @A_N_Y_R the OP already appears to be using that class... – Reimeus Apr 14 '15 at 13:15
  • I would suggest using [`Files.probeContentType`](https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#probeContentType-java.nio.file.Path-) to try and deduce the mime type. But there is no 100% reliable way of knowing what a file is except trying to read it as something and seeing if there are issues. – Boris the Spider Apr 14 '15 at 13:16
  • Ahhh yes. Just stated what I remembers. – Anjula Ranasinghe Apr 14 '15 at 13:17
  • If it's not already in `path`, how is `ext` going to get it? So my question is, how are you getting the filenames, and why is that thing dropping extensions? No point in using something which drops them, and then trying to get it back. – Abhay Apr 14 '15 at 13:17
  • Possible duplicate of http://stackoverflow.com/questions/2729038/is-there-a-java-library-equivalent-to-file-command-in-unix – thst Apr 14 '15 at 13:17
  • @A_N_Y_R I really wish people would stop suggesting the ancient and **obsolete** Apache libraries. Commons IO 2.4 [was released in 2012](http://mvnrepository.com/artifact/commons-io/commons-io/2.4). – Boris the Spider Apr 14 '15 at 13:19
  • And what if there are two files with the same base-name and two different extensions? – Abhay Apr 14 '15 at 13:19
  • @BoristheSpider I have tested this API; and the problem is that it wrongly relies on file extensions. It will, for instance, detect a PNG image named "foo.txt" as `text/plain`. `FileTypeDetector`s is really your only option. – fge Apr 14 '15 at 13:19

3 Answers3

2

Your best bet here is to "do it yourself" by implementing instances of FileTypeDetectors.

When you have this, you can then just use Files.probeContentType() to have a string returned which describes the file contents as a MIME type.


The JDK does provide a default implementation but it relies on file extensions, basically; if you have a PNG image named foo.txt, the default implementation will return text/plain where the file is really an image/png.

Which is of course wrong.


Final note: if all you really have is only part of the file name, then use Files.newDirectoryStream() and provide it with the appropriate DirectoryStream.Filter<Path>. Not sure yet why you only have part of it though.

fge
  • 119,121
  • 33
  • 254
  • 329
  • Is there an example of such approach somewhere? – Smajl Apr 14 '15 at 13:19
  • Unfortunately no, there aren't. I intended to try that out, but basically you need a file in `META-INF/services` named `java.nio.file.spi.FileTypeDetector` where all your implementations are listed. Those will then be registered by the JVM. – fge Apr 14 '15 at 13:21
  • Sad. But true. This _is_ the only truly robust way. – Boris the Spider Apr 14 '15 at 13:21
  • @BoristheSpider on the other hand writing such a detector might be made easier should there exist a Java library equivalent to libmagic; personally I haven't dug in that direction though. – fge Apr 14 '15 at 13:24
1

Since you're only given part of the file name, you'll need to search for files that start with that prefix. Note that there could be multiple matches.

Using java.nio.file

Path prefix    = Paths.get(path);
Path directory = prefix.getParent();

try (Stream<Path> stream = Files.list(directory)) {
    stream.filter(p -> p.getFileName().startsWith(prefix.getFileName() + "."))
          .forEach(p -> System.out.printf("Found %s%n", p));
}

Using java.io

File       prefix    = new File(path);
File       directory = prefix.getParentFile();
List<File> matches   = directory.listFiles((dir, name) ->
                           name.startsWith(prefix.getName() + "."));

for (File match: matches) {
    System.out.printf("Found %s%n", match);
}
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • Why use java.io.File in 2015 at all? – fge Apr 14 '15 at 13:37
  • Perhaps my knowledge is out of date. Is `java.io.File` officially or unofficially deprecated these days? – John Kugelman Apr 14 '15 at 14:09
  • 1
    It's not deprecated, and in fact many libraries and APIs still require you to use it to some extent or another. @fge is just campaigning that we stop teaching it to Java newcomers so that it can begin to be phased out. – Boris the Spider Apr 14 '15 at 14:11
0

Files.probeContentType(Path) implements a basic MIME type inquiry you can use (or extend), the internal details of which are platform specific. You can also make a little utility method that walks a Set of extensions. A combination of the two approaches may be necessary, depending on your application.

The MIME type checker will give different results on different releases implementations of the JRE. So, always have a fail-over solution.

See: http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#probeContentType%28java.nio.file.Path

[EDIT]

This actually does not answer the question posited, as this method needs a full, legal Path object to work on. If you are given just the stem name, and the extension is missing, then you neither have an extension to work with nor a valid Path name for Files to work with [and probeContentType() may, in some implementations, just use the extension anyway.]

I'm not sure how you can do this without Path that refers to a real on-disk file that the JRE can access, or by hand if you don't have an extension. If you don't have a File of some sort, you can't even open it up yourself to attempt file type "magic".