6

I'm having an issue with files reading (and perhaps writing) files on a linux system using Java. My application was complaining it could not read some audio files and when I looked on the system I noticed ls -l failed on these files as well and that all the problem files were the ones containing characters with quotes etc such as é, files without these characters are okay.

[root@N1-0247 Georges Bizet- Suites from Carmen & L'arlésienne]# pwd
/mnt/disk1/share/import/all/MusicUnmatched/WAV/Yan Pascal Tortelier/Georges Bizet- Suites from Carmen & L'arlésienne
[root@N1-0247 Georges Bizet- Suites from Carmen & L'arlésienne]# ls -l
ls: cannot access 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV: No such file or directory
ls: cannot access 19 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV: No such file or directory
ls: cannot access 18 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Intermezzo.WAV: No such file or directory
ls: cannot access 17 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Pastorale.WAV: No such file or directory
ls: cannot access 16 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Carillon.WAV: No such file or directory
ls: cannot access 15 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Adagietto.WAV: No such file or directory
ls: cannot access 14 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV: No such file or directory
ls: cannot access 13 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Prélude.WAV: No such file or directory
ls: cannot access 08 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Chanson Du Toréador (Act II).WAV: No such file or directory
ls: cannot access 07 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Dans Bohème (Gypsy Song, Act II).WAV: No such file or directory
ls: cannot access 05 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Seguédille (Act I).WAV: No such file or directory
ls: cannot access 04 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Habeñera (Act I).WAV: No such file or directory
ls: cannot access 02 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Prélude (Prelude To Act I).WAV: No such file or directory
ls: cannot access 01 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Toréadors (Introduction To Act I).WAV: No such file or directory
total 192148
?????????? ? ?    ?           ?            ? 01 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Toréadors (Introduction To Act I).WAV
?????????? ? ?    ?           ?            ? 02 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Prélude (Prelude To Act I).WAV
-rw-rw-rw- 1 root root 36681194 Feb 21  2017 03 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- La Grade Montante (Street Urchins' Chorus, Act I).WAV
?????????? ? ?    ?           ?            ? 04 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Habeñera (Act I).WAV
?????????? ? ?    ?           ?            ? 05 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Seguédille (Act I).WAV
-rw-rw-rw- 1 root root 16455464 Feb 21  2017 06 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Les Dragons D'Alcala (Entr'acte, Act II).WAV
?????????? ? ?    ?           ?            ? 07 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Dans Bohème (Gypsy Song, Act II).WAV
?????????? ? ?    ?           ?            ? 08 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Chanson Du Toréador (Act II).WAV
-rw-rw-rw- 1 root root 27743402 Feb 21  2017 09 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Intermezzo (Entr'acte, Act III).WAV
-rw-rw-rw- 1 root root 39886886 Feb 21  2017 10 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Marche Des Contrebandiers (Introduction To Act III).WAV
-rw-rw-rw- 1 root root 52822606 Feb 21  2017 11 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Nocturne (Micaela's Aria, Act III).WAV
-rw-rw-rw- 1 root root 23100378 Feb 21  2017 12 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Argonaise (Entr'acte, Act IV).WAV
?????????? ? ?    ?           ?            ? 13 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Prélude.WAV
?????????? ? ?    ?           ?            ? 14 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV
?????????? ? ?    ?           ?            ? 15 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Adagietto.WAV
?????????? ? ?    ?           ?            ? 16 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Carillon.WAV
?????????? ? ?    ?           ?            ? 17 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Pastorale.WAV
?????????? ? ?    ?           ?            ? 18 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Intermezzo.WAV
?????????? ? ?    ?           ?            ? 19 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Minuetto.WAV
?????????? ? ?    ?           ?            ? 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV

The filesystem is UTF8 I think, at least if I set

export LANG=en_US.UTF-8

in my profile filenames display with correct names.

Earlier on these files were renamed by the Java application to the new name so although error was reported it seems to be maybe some issue with the Java application, but I dont know what.

In my Java start script I have the line

export LC_ALL=en_US.UTF-8

I have had not encountered this problem on other linux systems or Windows, MacOS ecetera.

Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • What filesystem is being used to store the files (ext4, FAT, etc.)? – Mikel Rychliski May 10 '20 at 14:39
  • Where are you mounting this from? And which filesystem? – Tarun Lalwani May 10 '20 at 15:13
  • The filesystem is xfs, its a local filesystem on the linux box, the java application is running directly on the linux box. – Paul Taylor May 10 '20 at 15:39
  • Can you add the code base ? – Anish B. May 12 '20 at 04:37
  • Could you help the reproduction by running the following statement to list file(s) so we may learn what (octal) bytes are used for (one of) the problematic filenames? `LC_ALL=C ls` This prints on my system for example: `'test1__'$'\303\251''__.txt'` instead of `test1__é__.txt` – JohannesB May 13 '20 at 08:04
  • LC_ALL ls gives 20 - L' Arl??sienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV rather than 20 - L' Arlésienne, suite for orchestra No. 1, from the incidental music- Farandole.WAV – Paul Taylor May 13 '20 at 11:16
  • _My application was complaining it could not read some audio files_ Type of complain? Offending code snippet? – Stelios Adamantidis May 16 '20 at 18:10

4 Answers4

0

Try to declare -Dfile.encoding=UTF-8 in you java start command.

Dilson Rainov
  • 401
  • 4
  • 8
0

First, two points :

1) The fact that you are getting errors with ls shows that the problem is an issue between the filenames and the filesystem, not Java per se. You would get the same issue whatever language your program was written in - or indeed, if you tried to copy or rename a file directly on the command line.

2) The problem is not with the quote character, as is shown by the fact that the quote character appears in files that were named correctly - eg :

-rw-rw-rw- 1 root root 52822606 Feb 21  2017 11 - Carmen Suites for orchestra Nos. 1 & 2 (assembled by Ernest Guirard)- Nocturne (Micaela's Aria, Act III).WAV

So, the problem is with the unicode character é .

This character is this one : https://www.compart.com/en/unicode/U+00E9 , so it consists of a null byte followed by E9.

The trouble is that POSIX filesystems like xfs do not allow null bytes in filenames (see What are all the illegal characters in the XFS filesystem? )

Upshot is, you cannot have filenames with THAT character in THAT filesystem.

So, you have to change either the filename, or the filesystem.

For example, this page lists filesystems, indicating those that allow unicode in their filenames :

https://en.wikipedia.org/wiki/Filename#Comparison_of_filename_limitations

(as an aside, in that list is Apple's HFS+, but interesting to note that that has been replaced by the Apple File System APFS that does NOT allow unicode in filenames - https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html )

The other alternative to change your Java program to modify the filename to replace é with e :

    String safeFilename = filename.replaceAll("é", "e");

or if you prefer :

    String safeFilename = filename.replaceAll( "\u00e9", "e" );
racraman
  • 4,988
  • 1
  • 16
  • 16
  • 3
    HI, that is interesting but U+00E9 is the UTF-16 value, but should it not be written to file as the UTF-8 value (which is 0xC3 0xA9) ? – Paul Taylor May 11 '20 at 09:50
  • 1
    And furthermore, the filesystem would not *permit* the Java program to include a null in the pathname. At that level it is (should be) encoding agnostic. – Stephen C May 16 '20 at 02:31
  • Awarding you the bounty for the info that POSIX filesystems like xfs do not allow null bytes. Even though havent solved the issue, which I am going to put down to a fileystem corruption or issue with the files when copied over to system. – Paul Taylor May 17 '20 at 09:11
0

That is a nice challenging puzzle you got there as this is hard to debug

I tried to reproduce with java.nio.Files with Java 11 on Debian 10 with XFS and bash (as ls builtin) but could not reproduce the issue with files named é.

Please try to get to a simpler reproduction scenario and more details otherwise I just have to guess that this may have to do with:

The problem is also that this seems to be hard to debug as simple tools like strace do not show enough information on the bytes of the filenames getdents syscall to see what is going on at the lower level API's, see: for what I mean

Maybe it is time for a different strategy? Try to only write the full title of the songs to a playlist file? There will always be special chars that will in some setting cause problems, even spaces with scripts if you are uncareful, directory seperators (slash or backslash) etc. (See: this related question )

JohannesB
  • 2,214
  • 1
  • 11
  • 18
  • It is true Im not using Files everywhere, in fact Im using ApacheCommons to rename could that be the issue ? – Paul Taylor May 13 '20 at 05:21
  • Not sure yet, please respond to my comment on the question to run: `LC_ALL=C ls` so we can get closer to reproduction of the underlying problem – JohannesB May 13 '20 at 10:48
  • 1
    Right I have done testing on the same machine that exhibited the problem with some new files but I cannot get it to reoccur. So I have my application taking filenames that can be handled by ascii and then renaming so they require UTF8 and they continue to be assessible afterwards. But since the machine is black box device and my application is the only app on the device that renames files I do fear that it is my app that must have caused the issue. – Paul Taylor May 14 '20 at 15:24
  • Fixing issues that are not reproducable is crazy but I learned a lot in the past day about unicode while reading up on this, you cannot win them all :-) – JohannesB May 14 '20 at 15:39
0

Your filesystem is corrupted - it is not an application level issue, but the content on your physical disk is not valid according to the filesystem driver that converts the physical disk content into filenames + data. You need to check which device represents your file system (the "mount" command shows which device is mounted into which directory - it is probably something like /dev/sda1. You need to re-mount it as read-only (which can be tricky if this is your root filesystem) and run fsck /dev/sda1 (or whatever your device is) to repair it. It is not 100% sure that you can get those files back.