UTF-8 character in attachment name

Question

I've an attachment with character "ò" in his name.

Using an agent, i print this name in Domino console.

On my test server, this name is correctly printed. On production server, "ò" is replaced by "?" character.

Is there any server parameter to set?

UPDATE

I'll post some code to better explain the situation.

I've a notes document, with an embedded attachment which name contains "ò" character

Field Name: $FILE
Data Type: Attached Object
Data Length: 66 bytes
Seq Num: 18
Dup Item ID: 0
Field Flags: ATTACH SIGN SEAL SUMMARY 

Object Type: File
Object ID: 0022E992
Object Length: 567438
File Name: ALLEGATO A instanza fossò.pdf   <-----------------------

Using an agent, i want to get this attachment to copy it into another document. To do this, i call it via Ajax through a POST, passing into parameter the attachment name

url = '/' + $F('DbJS') +'/myagent?openagent';

var pars = $H({
    attachmentName: attachmentName
});


var ajReq = new Ajax.Request (
    url, 
    {
        method: "post", 
        parameters: pars.toQueryString(), 
        onComplete: doSomething
    }
);

In the Java agent, first i get parameters from POST call

Vector attachmentNameVec = session.evaluate("@urldecode(\"UTF-8\";@left(@right(request_content; \"attachmentName=\");\"&\"))", doc);    
String  attachmentName= (String)attachmentNameVec.elementAt(0);
System.out.println("ATTACHMENT NAME:" + attachmentName);

At this point, i try to get the attachemnt.

On test server, print debug get:

 ATTACHMENT NAME: ALLEGATO A instanza fossò.pdf

On production server get:

 ATTACHMENT NAME: ALLEGATO A instanza foss?.pdf

and consequently

doc.getAttachment(attachmentName)

fail.

INFORMATION

Checking for Linux servers confuguration, i've noticed this (locale command):

Test server (right behaviour):

LANG=POSIX
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Production server (wrong behaviuor):

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Could depends on this?

UPDATE 2

These are the results obtained following Richard suggestion:

=== Test server (right) ===

UNDECODED FILE NAME->File%201%20per%20Foss%C3%B2.txt.p7m
HEX REPRESENTATION: 46696C6525323031253230706572253230466F73732543332542322E7478742E70376D25374346696C6525323031253230706572253230466F73732543332542322E7478742E70376D
USING PLATFORM CHARSET->File 1 per FossÃ².txt.p7m

=== Production server (wrong) ===

UNDECODED FILE NAME->File%201%20per%20Foss%C3%B2.txt.p7m
HEX REPRESENTATION: 46696C6525323031253230706572253230466F73732543332542322E7478742E70376D25374346696C6525323031253230706572253230466F73732543332542322E7478742E70376D
USING PLATFORM CHARSET->File 1 per Foss??.txt.p7m

As you can see, same HEX representation.

UPDATE 3

Richard, the informations requested

=== Test server (correct) ===

HEX for @urldecode using UTF-8

46696C6520312070657220466F7373F22E7478742E70376D7C
46696C6520312070657220466F7373F22E7478742E70376D

HEX for @urldecode using Platform

46696C6520312070657220466F7373C3B22E7478742E70376D7C
46696C6520312070657220466F7373C3B22E7478742E70376D

=== Production server (wrong) ===

HEX for @urldecode using UTF-8

46696C6520312070657220466F73733F2E7478742E70376D7C46696C6520312070
657220466F73733F2E7478742E70376D

HEX for @urldecode using Platform

46696C6520312070657220466F73733F3F2E7478742E70376D7C46696C6520312070
657220466F73733F3F2E7478742E70376D

What are the details of the servers? Windows? Linux? Which version number, and which language? And which version of Domino? And are you looking at the native console on the actual server, the remote console in the Domino Administrator client, or the console in the Java server controller? And what about the log? Is the character rendered correctly in both servers' logs if you open them from your own Notes client? — Richard Schwartz, Apr 02 '14 at 12:46
Linux server, experienced on 8.5.1FP2 and 8.5.3FP4, English language. I'm looking at console in Domino Administrator — Andrea Baglioni, Apr 02 '14 at 13:16
Both servers are the same Linux, same version, same language of OS and language of installed Domino? Are you looking at both consoles on the same Domino Administrator client? From the memory inside your agent to the actual screen that you view, every single step along the way introduces the possibility of an incorrect character conversion. If you are getting different results, then there is a difference in the environment somewhere. — Richard Schwartz, Apr 02 '14 at 15:04
BTW, it's not a "UTF-8" character. It's just a character. The native character set of LotusScript and Java agents is Unicode UTF-16, but output is generally converted to the "platform native" character set for display purposes. For many servers, e.g., English language Windows, the platform native character set is also UTF-16, but I've encountered Domino servers (I think they were on Windows) where the platform native character set was one of the Japanese character sets, and that exposed a bug in some of my Java agent code that was using the default platform conversion in a getBytes call. — Richard Schwartz, Apr 02 '14 at 15:10
Thanks Richard! Not same servers (as i said, 8,5,3 and 8.5.1), but same Admin client. The purpose is to get the attachment by name, and attach to another document. In some case is correct, in other case not ("ò" character is not "resolved", so attachment fetch fail!) This is the situation. What i said about console refers to a "System.out.println(attachName)" that i print to see what happens. This is the reason why i said that probably there is something in server configuration (but i don't know what is...). Hope to explain clearly. — Andrea Baglioni, Apr 02 '14 at 19:19
I don't think it has anything to do with the Domino configuration of either of the two servers, but it might have to do with the Linux operating system configuration. I believe that the JVM takes its character set configuration from the OS. I think it might be helpful if you added the code to your question and pointed to where the two servers give different values for the same document and attachment. — Richard Schwartz, Apr 02 '14 at 20:23
Could check that the configuration is the same between the 2 servers in: default encoding of the web (server document) or if you are using sites document in the site document Domino Web Engine / Character Set — Emmanuel Gleizer, Apr 03 '14 at 10:28
Emmanuel, i have the exactly same configuration (Internet Site) between the 2 servers — Andrea Baglioni, Apr 03 '14 at 13:06

score 1 · Answer 1 · edited May 23 '17 at 11:57

I strongly suspect that the difference is in fact related to the locales. Obviously, changing the locale on the production server is risky because other things might break, so I am not going to suggest that. Instead, I think it would be best to add some additional code to your agent. First of all add a line like this:

Vector attachmentNameUndecodedVec = session.evaluate("@left(@right(request_content; \"attachmentName=\");\"&\")", doc);

and print out that value.

Also declare a byte[] array and call attachmentName.getBytes() -- without specifying the optional charset argument. Then convert the byte array to a hex string (see here) and print that out. That way, no additional conversion will be done and you will see exactly what is in memory as a result of the @Urldecode call. I think that the difference between what you find on your test and production systems will show us that an automatic charset conversion is occurring somewhere, and by comparing the bytes against different encoding charts we may be able to figure out how to account for it.

I would also suggest trying a call to @Urldecode with the "Platform" charset specified to see what you get there.

Okay. The hex you printed appears to be for attachmentNameUndecoded. I was actually hoping to see the hex for attachmentName -- after having called the original code that does the UrlDecode with the "UTF-8" argument, and also the hex of the version that was decoded with UrlDecode using the "platform" argument. I guess I didn't make myself clear. Sorry about that! :-) — Richard Schwartz, Apr 07 '14 at 16:02
Sorry Richard for the delay. In UPDATE 3 the informations requested. TIA — Andrea Baglioni, Apr 11 '14 at 10:16

UTF-8 character in attachment name

1 Answers1