2

With IdFTP, the server i'm connecting to is not using UTF-8, but ANSI. There's nothing special about my code, i simply set Host, Username, Password and Connect to server. Then i call List method with no parameters. Iterating through DirectoryListing gives me incorrect results for file names. My sample directory name encoded in local code page (CP-1250) is:

aąęsśćńółżźz

I thought i'll be able to "fix" file name field by converting it to AnsiString and setting code page but it seems to be already broken - memory dump of DirectoryListing[I].FileName:

a    ?    ?    s    ?    ?    ?    ??   ??   z 
6100 FDFF FDFF 7300 FDFF FDFF FDFF 8FDB DFDF 7A00

Manipulating with GIdDefaultAnsiEncoding or IOHandler.DefStringEncoding (after Connect, before List) makes no difference. I don't want to mess in IdFTP or IdGlobal code because i'm using it with other projects that involve Unicode and these works perfectly. Delphi XE2 or XE7.

As you can see FData contains raw file name in a 2 bytes per char string:

enter image description here

Even if i set IOHandler.DefStringEncoding to any TIdTextEncoding that is FIsSingleByte = True, FMaxCharSize = 1. However it looks promising because #$009F is "ź" in CP-1250, but i'm not looking for a per server, temporary solution. I expected Indy to handle this correctly after setting IOHandler.DefStringEncoding and GIdDefaultAnsiEncoding based on server capabilities (UTF-8 or ANSI with specified encoding).

Total Commander connection log:

enter image description here

Sir Rufo
  • 18,395
  • 2
  • 39
  • 73
rime
  • 78
  • 2
  • 6
  • What is the default codepage for non-unicode aware applications on the OS that's running the Delphi FTP client? (I made the assumption that the client and server are different machines). – Duncan Mar 31 '15 at 14:55
  • Polish (Windows-1250) – rime Mar 31 '15 at 15:24
  • `IdFTP1.DefStringEncoding := IndyTextEncoding(1250);` ? – J... Mar 31 '15 at 15:50
  • @J...: That has no effect in this case, as the listing data is not transmitted on the same socket as FTP commands, and `TIdFTP.List()` transfers `MLSD` listing data as 8bit raw data and does not use `DefStringEncoding` for parsing. – Remy Lebeau Mar 31 '15 at 18:32

2 Answers2

4

Your server supports the MLSD command. Total Commander is sending the MLSD command and not the older LIST command. This is good, because MLSD has a standardized format (see RFC 3659), which includes support for embedded charset information. If no charset is explicitly stated, UTF-8 must be used.

You did not show the command/response log for TIdFTP, but the fact that the TIdFTPListItem.Data property is showing MLSD formatted output data means TIdFTP.List() is also using the MLSD command (by calling TIdFTP.ExtListDir() internally). The output shown does not include an explicit charset attribute, so TIdFTP will decode the filename as UTF-8.

However, the raw filename data that is shown in the TIdFTPListItem.Data property is NOT the correct UTF-8 encoded form of the directory name you have shown (even when stored as a raw 8-bit encoded UnicodeString - which is what TIdFTP.ExtListDir() does internally before parsing it). So the problem is either:

  1. your FTP server is not converting the directory name from CP-1250 to UTF-8 correctly in the first place. Considering that Total Commander appears to be able to handle the listing correctly, this is not likely.

  2. TIdFTP is not storing the raw UTF-8 octet data correctly before parsing it. This is more likely.

Hard to say which is actually the case since you did not show the raw listing data that is actually being transmitted. And you did not specify which exact version of Delphi and Indy you are using, either. Assuming the server is transmitting UTF-8 correctly, you might simply be using an older Indy version that does not handle the UTF-8 transmission correctly. AFAIK, the current version available (10.6.2.5270 at the time of this writing) should be able to handle it, as long as you are using Delphi 2009 or later. If you can provide a Wireshark capture of the raw listing data, I can check if there are any logic issues in TIdFTP that need to be fixed or not.

Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Yesterday I updated Indy in XE2 to 10.6.2.5270 from SVN. It made no difference for List results – rime Apr 01 '15 at 20:06
  • Then please provide a capture of the *raw* listing data that is being transmitted on the network (not the data that is stored in the `TIdFTPListItem`). I will check to see if it is malformed and if not then run it through `TIdFTP` locally to see if there is a logic bug somewhere in the parsing. – Remy Lebeau Apr 01 '15 at 20:15
  • Sorry for not providing any logs, but i had to move on with coding. I can create a test account on that server and share credentials if you are still looking into resolving this issue. – rime Apr 08 '15 at 01:41
  • You can [contact me privately](http://www.lebeausoftware.org) if you want to provide such credentials. Would be nice to see what is actually happening with `TIdFTP` on this server. – Remy Lebeau Apr 08 '15 at 04:14
  • @DamianWoroch: I responded in private, but for everyone else's benefit, the FTP server in question is, in fact, sending the folder name encoded as CP-1250 and not as UTF-8 as required by the MLSD specification when no "charset" is specified in the listing data. So the FTP server is at fault, not Indy. When I force `TIdFTP` to parse the transmitted bytes as CP-1250, the correct filename is then stored in the `TIdFTPListItem.FileName` property as expected. I don't know how TotalCommander is able to process the malformed folder name correctly, but even FileZilla is not able to. – Remy Lebeau Apr 11 '15 at 04:57
0

My team was looking for quick solution that i had to provide. My solution is based on this post: http://forums2.atozed.com/viewtopic.php?p=32301#p32301 and this question: Converting UnicodeString to AnsiString

enter image description here Once FTP listing is finished i do overwrite FileName property via function that extracts file name from Data, and then convert String to RawByteString with correct code page. Fix is applied only if server doesn't support UTF-8. This way i'm able to move around FTP - ChangeDir, Get, Put etc. without problems.

Community
  • 1
  • 1
rime
  • 78
  • 2
  • 6