3

My Java code is having problem when a String is converted into an actual Path on a unix system

contains unmappable characters: /out/K/Kyuss/?And the Circus Leaves Town/09 - Size Queen.mp3
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /out/K/Kyuss/?And the Circus Leaves Town/09 - Size Queen.mp3
    at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
    at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
    at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
    at java.io.File.toPath(File.java:2234)
    at com.jthink.songkong.analyse.analyser.SongSaver.saveRenamedFile(SongSaver.java:891)
    at com.jthink.songkong.analyse.analyser.SongSaver.realSave(SongSaver.java:809)
    at com.jthink.songkong.analyse.analyser.SongSaver.saveSongToFile(SongSaver.java:630)
    at com.jthink.songkong.analyse.analyser.SongSaver.saveChanges(SongSaver.java:190)
    at com.jthink.songkong.analyse.analyser.SongSaver.call(SongSaver.java:165)
    at com.jthink.songkong.analyse.analyser.SongSaver.call(SongSaver.java:59)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The problem character is the ellipse char '…' (shown as ? in error message output) which is not an 8bit character, but why does it need to be I wasn't aware there was such a restriction on unix systems.

Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • linux does not like spaces at all (not that it doesnt allow it but they are a pain with some programs in linux and dont work with everything 100%), if you put in an under score instead of space the problem will most likely be solved. or it could be a problem with the `/` and it thinks that is being used in a filename and not a directory path.. – jgr208 Sep 29 '16 at 16:45
  • @jgr208 thanks but you are incorrect the issue is with the ellipse char as explained in the question – Paul Taylor Sep 29 '16 at 17:20
  • was the file meant for a linux machine or was it intended for windows at first? – jgr208 Sep 29 '16 at 17:29
  • @jgr208 its was always intended for linux, but it has a new filename based o constructing filename from metadata, and the metadata contains the eillpse character. But all I want to know is in what circumstances is the ellipse character valid or not valid on linux. – Paul Taylor Sep 30 '16 at 13:22
  • i just entered an ellipse on my linux system and had no trouble displaying the file name `[root@local untitled folder]# find . ./...And the ./...And the/a ` – jgr208 Sep 30 '16 at 14:56
  • @jgr208 yes nor do I , thanks for your help but you are totally misunderstanding my question – Paul Taylor Oct 01 '16 at 11:04
  • how am I misunderstanding? you are asking if linux OS limits the file name which I don't think linux is the root of the problem it is java that is most likely the root of the problem – jgr208 Oct 03 '16 at 14:23
  • run `echo $LANG` and what is the output. – jgr208 Oct 03 '16 at 15:51
  • @jgr208 its set to nothing ! – Paul Taylor Oct 13 '16 at 18:44

1 Answers1

12

Linux treats filenames as byte strings. It's the applications that choose to interpret the byte strings however they want. More info here. Typically programs interpret filenames as UTF-8, but this depends on many factors, including the LANG environment variables.

The issue is that Java uses the LANG variable to guess what encoding your filenames use. If you don't have it properly set (for example to en_US.UTF-8) it might assume your filenames are ASCII, and it refuses to encode the ellipse char since it has no ASCII encoding.

Small example to reproduce it:

import java.io.File;

public class Test {
    public static void main(String[] args) {
        File f = new File("\u2026");
        f.toPath();
    }
}

If you run it with LANG=C you get the error.

$ LANG=C java Test
Exception in thread "main" java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: ?
    at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
    at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
    at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
    at java.io.File.toPath(File.java:2234)
    at Test.main(Test.java:6)

If you run it with LANG=en_US.UTF-8 it works fine.

$ LANG=en_US.UTF-8 java Test
# No crash!

If you run it without setting LANG it will pick up whatever your system is configured with, which will throw if it doesn't support unicode.

Unfortunately I see no easy way to fix this behavior from your program. UnixPath.encode uses Charset.defaultCharset() and there's no way to change it at runtime. You'll have to make sure your LANG is properly configured.

Dirbaio
  • 2,921
  • 16
  • 15