2

On one of our ColdFusion 10 enterprise / CentOS 6.5 servers umlauts in filenames are saved as ?.

For example:

<CFPROCESSINGDIRECTIVE pageencoding="UTF-8">
<CFSET VARIABLES.umlauts = "ümläüté" />
<CFSET VARIABLES.filename = createUUID() & "-" & VARIABLES.umlauts & ".txt" />
<CFFILE action="write" output="#VARIABLES.umlauts#" file="#expandpath("./" & VARIABLES.filename)#" />
<CFOUTPUT>#VARIABLES.filename#</CFOUTPUT> <!--- outputs something like: A9C9BC8C-983A-5EA6-A4ED411BA0E63C72-ümläüté.txt --->

writes a file called A8B49720-020A-2500-605F4CC73129D07C-?ml??t?.txt to disk. The content of the file is like expected "ümläüté". Manual creating files with umlauts in filename is no problem (e.g. touch äöüß.txt works like expected).

More details of server:
Java Version: 1.6.0_29
Tomcat Version: 7.0.23.0
Java File Encoding: UTF8

$ cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"

$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

Any ideas what could cause this behaviour?

Seybsen
  • 14,989
  • 4
  • 40
  • 73
  • Is the name of the file *really* hard-coded in the source code like that, or is that just an example? If it is in the file, you'll need a `` towards the top of the file. – Adam Cameron Dec 03 '14 at 16:24
  • That's just example code. The `` does not solve the problem. Umlauts are still `?`. – Seybsen Dec 03 '14 at 16:27
  • No, the processingdirective is only relevant if it IS in the code. Which I did say. If you output the values on the screen, are they well-formed? – Adam Cameron Dec 03 '14 at 16:38
  • Feels like this question was just asked recently. I will see if I can find the thread. – Leigh Dec 03 '14 at 16:38
  • yes, output in browser is correct. – Seybsen Dec 03 '14 at 16:42
  • @Leigh, I found it. http://stackoverflow.com/questions/27056883/cfdirectory-with-coldfusion-11-issue-with-non-ascii-characters-in-filenames ... no solution as of yet. I think the processingdirective answer is a red hering. I still think this is a DISPLAY issue - the underlying file on the disk has to be ok right? It's just the encoding - the way it's output to the page? – Mark A Kruger Dec 03 '14 at 16:46
  • Well I see he DID specify that it outputs the file _incorrectly_ to the disk. – Mark A Kruger Dec 03 '14 at 16:47
  • (Edit) @MarkAKruger - Yep, that is the one. Though on second read it may not be exactly the same issue .. – Leigh Dec 03 '14 at 17:00
  • 1
    I found this user of Open Blue Dragon (an alternative CFML engine) having exactly the same issue and https://code.google.com/p/openbluedragon/issues/detail?id=516 his solution, quoting, was: `It seems like this has been resolved by setting "LC_ALL=en_US.UTF-8". It seems to be a tomcat problem that it sets question marks for special characters if the charset is unknown.`, or perhaps `"de_DE.UTF-8"`in your case. – Regular Jo Dec 04 '14 at 21:40

1 Answers1

1

I'll put it out as an answer for more clear visibility.

A user of Open Blue Dragon (an alternative CFML Engine) was having exactly the same issue.

If I try to upload a file with, for example, the filename "testätest.pdf", then I have the following situation:

  • The file, OpenBD stores to my filesystem, is named: test?test.pdf
  • The filename, reported via #cffile.ServerFile# is: testätest.pdf

He later came back with this answer

It seems like this has been resolved by setting "LC_ALL=en_US.UTF-8". It seems to be a tomcat problem that it sets question marks for special characters if the charset is unknown.

Or, in the OP's case, to set LC_All to "de_DE.UTF-8" perhaps.

Source: Issue 516: Special characters (like german "Umlauts") in filenames of uploaded files are replaced with "?"

Community
  • 1
  • 1
Regular Jo
  • 5,190
  • 3
  • 25
  • 47
  • That is the same solution as in [the thread mentioned above](http://stackoverflow.com/questions/27276362/umlauts-in-filenames-are-truncated-are-shown-as-question-marks#comment43023111_27276362). But the lingering question was [*how* to add that setting to CF's custom Tomcat implementation](http://stackoverflow.com/questions/27056883/cfdirectory-with-coldfusion-11-issue-with-non-ascii-characters-in-filenames#comment42642317_27056883). (I do not know). Any ideas for the other guy? – Leigh Dec 05 '14 at 21:21
  • Unfortunately, I don't either. – Regular Jo Dec 05 '14 at 21:38