When walking the file tree of a linux machine (perhaps also other Unix), I can encounter files or dirs names who are encoded with different encodings than the one returned by Java's default charset (Charset.defaultCharset()). This is due to the fact that a user can change he's locale and write a file or dir who's name is encoded with the user's custom locale.
I want to walk the entire file system, (for example using Files.walkFileTree) and for each file/dir encountered always be able to store something so that I can later create a new Path object successfully to that file/dir.
However if I encounter a file/dir whose name is encoded with an unknown encoding I am unable to access that file again.
To demonstrate the problem, on a RHEL6 machine I have a dir under "/home/languages" whose name is encoded with encoding he_IL.iso88598 and the locale of the system is also he_IL.iso88598 The following code decodes the name once as it should using the platform's default encoding and once as UTF-8:
Path source = Paths.get("/home/languages");
Files.walkFileTree(source, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult preVisitDirectory(Path dir,
BasicFileAttributes attrs) throws IOException {
String badName = new String(dir.toString().getBytes("UTF8"));
String name = dir.toString();
Files.exists(Paths.get(name));
Files.exists(Paths.get(badName))
return FileVisitResult.CONTINUE;
}
});
The following exception is thrown:
Exception in thread "main" java.nio.file.InvalidPathException: Malformed input or input contains unmappable chacraters: /home/languages/?¢??¨??×
Java is able to access files/dirs even if their encoding is unknown once during the walkFileTree so why can't I access those paths again?
Thanks