0

Problem

I have a client that is running a webapplication polling a specific directory. Files with special characters, are having those characters converted to '?".

Example filename: java.io.FileNotFoundException: Garc??a.pdf (No such file or directory)

I do not get the specific filename from the database and the file has the correct filename on the disk. I have not been able to reproduce it on my own machine.

Tomcat Startup Params

 [-Dnop] 
 [-Dcatalina.home=/app/tomcat] 
 [-Dcatalina.base=/app/tomcat] 
 [-Djava.io.tmpdir=/app/tomcat/tmp] 
 [-Djava.endorsed.dirs=/app/tomcat/lib-endsed] 
 [-Dep.tomcat.http.port=8080] 
 [-Dep.tomcat.shutdown.port=64006] 
 [-Dep.tomcat.rmi.port=64008] 
 [-Dep.tomcat.sso.enabled=false] 
 [-Djava.security.auth.login.config=/app/tomcat/etc/jaas.config] 

 **[-Dfile.encoding=UTF-8]** 

 [-Dcom.sun.management.jmxremote=true] 
 [-Dcom.sun.management.jmxremote.port=64007] 
 [-Dcom.sun.management.jmxremote.authenticate=true] 
 [-Dcom.sun.management.jmxremote.ssl=false] 
 [-Dspring.profiles.active=production] 
 [-Degrants.configuration=/app/tomcat/etc/test.properties] 
 [-Dops.product=tomcat] 
 [-Dops.node.number=uniq] 
 [-Xms1024m] 
 [-Xmx1024m] 
 [-XX:PermSize=128m] 
 [-XX:MaxPermSize=128m] 
 [-XX:+UseParallelGC] 
 [-XX:+AggressiveOpts] 
 [-XX:+UseFastAccessorMethods] 

Folder

The specific folder is a shared space using NFS

Java Version

java version "1.6.0_91" 
Java(TM) SE Runtime Environment (build 1.6.0_91-b13) 
Java HotSpot(TM) Server VM (build 20.91-b07, mixed mode) 

System Lang Parameter

env | grep LANG 
NLS_LANG=American_America.UTF8 

Checking the Charset/Encoding

I included the code from the answer @ How to Find the Default Charset/Encoding in Java? . The log prints that UTF-8 is being used:

Default Charset=UTF-8
file.encoding=UTF-8
Default Charset in Use=UTF8 

Question

Except for asking the client to change the JDK I can't really think of something else to do. Any suggestions? What could be causing this? How do I resolve it?

Update-Polling Folders

Filenames are extracted from polling the actual file system. We are using the listFiles() method of the File java class to get back the files within folders. Ref: https://docs.oracle.com/javase/6/docs/api/java/io/File.html#listFiles()

Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • It might be a problem with the client using the wrong codepage to render the page. What encoding is used to transmit the generated html? Is there a `Content-Type`-header or a ``-tag specifying the codepage used? – piet.t Jun 21 '16 at 07:15
  • @piet.t It's not a rendering issue but instead within a scheduled thread that parses files. The file in the folder displays the correct name but Java cannot open it and interprets the file as including ?? characters. – Menelaos Jun 21 '16 at 07:21
  • 1
    What operating system are you using? – Alastair McCormack Jun 21 '16 at 07:33
  • @Alastair McCormack They are using centOs 6.8. – Menelaos Jun 21 '16 at 07:40
  • 1
    Also, the error message is showing incorrect characters, are you sure Java has decoded the filename correctly from the DB? As logging and writing to a file can introduce more encoding confusion, the best way to check is to do `filenameString.getBytes("UTF-8")` then convert to hex using http://stackoverflow.com/questions/2817752/java-code-to-convert-byte-to-hexadecimal. Then paste the result. – Alastair McCormack Jun 21 '16 at 07:43
  • 1
    I misread this: "I do not get the specific filename from the database" - so where do you get the filename from? – Alastair McCormack Jun 21 '16 at 07:45
  • @ Alastair McCormack The filename is extracted from polling the actual file system. We are using the `listFiles()` method of the `File` java class to get back the files within folders. Thanks – Menelaos Jun 21 '16 at 07:48
  • 1
    ok, so `listFiles` is returning strings with question marks in it? – Alastair McCormack Jun 21 '16 at 07:51
  • @Alastair McCormack , Seems so... your questions seem to be pointing at something similar to: http://stackoverflow.com/questions/3610013/file-listfiles-mangles-unicode-names-with-jdk-6-unicode-normalization-issues . I have to ask the client to update their JDK to 1.7 then in order to test the hypothesis. If you wish please post an answer so I can upvote. – Menelaos Jun 21 '16 at 07:52
  • I'm not sure it's that tbh as that's related to normalisation so unless you're doing some excessive manipulation to the string before trying to open the file - more likely that the encoding of the filenames is not UTF-8. How do you validate the encoding of the filenames? – Alastair McCormack Jun 21 '16 at 07:58
  • 1
    Also, it's still worth checking the UTF-8 hex output to see if how your string correlates to the filenames. – Alastair McCormack Jun 21 '16 at 08:08

0 Answers0