Stop Firebird modifying strings based upon Windows charset

Question

I have an application (written in Delphi) using the 1.5.5 Firebird embedded engine. I am using this engine since the application works with currently deployed Firebird databases and newer embedded engines won't open the database files correctly (ODS 10.1). All strings in the database are defined as VARCHAR(N) where N varies. The application used to be an ANSI application so the data contains ISO-latin-1 characters. Now the application is upgraded to be an unicode app. In order to store Unicode characters in existing databases (around 10k instances) I write an UTF8-BOM (if you can call it that) and then the remainder of the string is considered to be UTF8 and decoded by the database layer as such. This way we can use all the existing databases and still use All Unicode characters.

This works well for all machines in western Europe. But when the application is run in Romania (a Windows PC with Romanian language settings): the database engine alters the characters. For example: the UTF8 character string starts with character octet EF (ï). The database engine returns it as octet 69 (i).

How can this problem be solved for existing databases?

NB: I tried to specify a character set OCTETS when opening the database (using UIB library) but this fails as the charset is unknown.

Found out that the problem lies within UIB (the database layer used in this case). UIB handles csNONE in such a manner that if you give it a bytewise string (datatype AnsiString) it is converted to an UnicodeString by simply expanding the bytes to words and further on reduces it with the current threads codepage. Since Romania used no iso-latin-1 as it codepage... the data is corrupted there.

For now I changed the following routine in UIBLib (eg when ansistring is given and charset is none and an ansistring parameter is requested -> do no conversion at all):

  procedure TSQLDA.EncodeStringA(Code: Smallint; Index: Word; const str: AnsiString);
  begin
  {$IFDEF UNICODE}
    if FCharacterSet = csNONE then begin // new
      EncodeStringB( Code, Index, str ); // new
    end else begin                       // new
      EncodeStringB(Code, Index, MBUEncode(UniCodeString(str), CharacterSetCP[FCharacterSet]));
    end;                                 // new
  {$ELSE}
    EncodeStringB(Code, Index, str);
  {$ENDIF}
  end;

Now I need to check if this behavior is correct for the library and give the maintainer a patch.

What connection characterset are you using right now, and what is the default characterset and or specific column characterset? — Mark Rotteveel, Feb 12 '13 at 16:33
Also: the characterset `OCTET` doesn't exist, it is `OCTETS`, but I am not actually sure if you can use that as a connection characterset — Mark Rotteveel, Feb 12 '13 at 16:33
The charset used when connecting is specified in an enumeration in UIB and currently is csNONE. Specifying csOCTETS, csISO8859_1 yield an unknown charset is not defined error. Sorry that the name OCTET was used (will change it) instead of OCTETS — Ritsaert Hornstra, Feb 12 '13 at 17:15
Does your Firebird embedded also include the `intl`-folder with `fbintl.conf` and `fbintl.dll` and does it have those charactersets installed (if not read up on the `doc\README-intl.txt`). Also be aware that with characterset NONE the server sends the data as is and uses the local characterset for conversion, if you get 0x69 back then either it is stored as such, or your local characterset conversion does this. — Mark Rotteveel, Feb 12 '13 at 18:36

Stop Firebird modifying strings based upon Windows charset

0 Answers0