I have an application (written in Delphi) using the 1.5.5 Firebird embedded engine. I am using this engine since the application works with currently deployed Firebird databases and newer embedded engines won't open the database files correctly (ODS 10.1). All strings in the database are defined as VARCHAR(N) where N varies. The application used to be an ANSI application so the data contains ISO-latin-1 characters. Now the application is upgraded to be an unicode app. In order to store Unicode characters in existing databases (around 10k instances) I write an UTF8-BOM (if you can call it that) and then the remainder of the string is considered to be UTF8 and decoded by the database layer as such. This way we can use all the existing databases and still use All Unicode characters.
This works well for all machines in western Europe. But when the application is run in Romania (a Windows PC with Romanian language settings): the database engine alters the characters. For example: the UTF8 character string starts with character octet EF (ï). The database engine returns it as octet 69 (i).
How can this problem be solved for existing databases?
NB: I tried to specify a character set OCTETS when opening the database (using UIB library) but this fails as the charset is unknown.
Found out that the problem lies within UIB (the database layer used in this case). UIB handles csNONE in such a manner that if you give it a bytewise string (datatype AnsiString) it is converted to an UnicodeString by simply expanding the bytes to words and further on reduces it with the current threads codepage. Since Romania used no iso-latin-1 as it codepage... the data is corrupted there.
For now I changed the following routine in UIBLib (eg when ansistring is given and charset is none and an ansistring parameter is requested -> do no conversion at all):
procedure TSQLDA.EncodeStringA(Code: Smallint; Index: Word; const str: AnsiString);
begin
{$IFDEF UNICODE}
if FCharacterSet = csNONE then begin // new
EncodeStringB( Code, Index, str ); // new
end else begin // new
EncodeStringB(Code, Index, MBUEncode(UniCodeString(str), CharacterSetCP[FCharacterSet]));
end; // new
{$ELSE}
EncodeStringB(Code, Index, str);
{$ENDIF}
end;
Now I need to check if this behavior is correct for the library and give the maintainer a patch.