1

I have a piece of code in Delphi which encodes the ASN1 objectId (iod) to a character string (this string is used later in the rest of the program, and at the end of the day the program converts each character in the string to corresponding hexadecimal values for further processing).

This piece of code was written for Windows OS. Now I am trying to port it to Linux (Centos). I am using RAD Studio for compiling this code for both Windows and Linux platforms.

In Linux, the same program produces a different output. I thought that the difference in the output is caused by the character set or locale settings used in Windows and Linux. In Windows, I see the character set used is 'Windows-1252' and in my Linux machine the default locale is set to 'en_US.utf8'. So, in the hope of getting the same output in Windows and Linux, I added a new locale using the below commands/steps in Linux:

sudo localedef -v -c -i /usr/share/i18n/locales/en_US -f ./CP1252 en_US.CP1252

After this step, when I did this command:

list-locales | grep -i 1252

I got the below output:

en_US.cp1252

Now, I did the below command:

localectl set-locale LANG="en_US.cp1252"

After this step, when I run the 'locale' command, I see the below output

LANG=en_US.cp1252
LC_CTYPE="en_US.cp1252"
LC_NUMERIC="en_US.cp1252"
LC_TIME="en_US.cp1252"
LC_COLLATE="en_US.cp1252"
LC_MONETARY="en_US.cp1252"
LC_MESSAGES="en_US.cp1252"
LC_PAPER="en_US.cp1252"
LC_NAME="en_US.cp1252"
LC_ADDRESS="en_US.cp1252"
LC_TELEPHONE="en_US.cp1252"
LC_MEASUREMENT="en_US.cp1252"
LC_IDENTIFICATION="en_US.cp1252"
LC_ALL=

I logged out of my session and logged in again, and I see the same output as above for the 'locale' command in the new session as well.

I executed my Test program in Linux, hoping I will get the same output as in Windows, but to my surprise I am getting the same output as before changing the locale.

Can someone please help to point out what is missing here? Is the locale setting not taking effect, or is there anything wrong in what I am doing?

Below is the sample program that I am using, and the corresponding output in Windows and Linux.

program TestProj;
{$APPTYPE CONSOLE}

{$APPTYPE CONSOLE}

{$R *.res}

uses
  SysUtils,System.Net.Socket;

function CutString(const Trenner : AnsiString; var s : AnsiString) : AnsiString;
var
  i : integer;
begin
  if s = '' then
    CutString := ''
  else begin
    i := pos(Trenner, s);
    if i = 0 then begin
      CutString := s;
      s := ''
    end
    else begin
      CutString := copy(s, 1, i - 1);
      delete(s, 1, i - 1 + length(Trenner))
    end
  end
end;

function CodeObjectId(oid : AnsiString) : AnsiString;
var
  s : Ansistring;
  i, n : integer;
  testInt : integer;
  testInt1 : integer;
  testChar : AnsiChar;
  testChar1 : AnsiChar;
begin
  i := 0;
  while oid <> '' do begin
    inc(i);
    n := StrToInt(CutString('.', oid));
    if i = 1 then
      s := ansichar(40 * n)
    else if i = 2 then
      s := ansichar(ord(s[1]) + n)
    else begin
      if n > $3fff then
        s := s + ansichar($80 or ((n shr 14) and $7f)) + ansichar($80 or ((n shr 7) and $7f)) + ansichar(n and $7f)
      else if n > $7f then   begin
        s := s + ansichar($80 or ((n shr 7) and $7f)) + ansichar(n and $7f);
        testInt :=  ($80 or ((n shr 7) and $7f));
        testChar := ansichar(testInt);
        testInt1 :=  (n and $7f);
        testChar1 := ansichar(testInt1);
        writeln('testInt : ', testInt);
        writeln('testChar : ', testChar);
        writeln('testInt1 : ', testInt1);
        writeln('testChar1 : ', testChar1);
      end
      else
        s := s + ansichar(n and $7f)
    end
  end;
  CodeObjectId := s
end;

var
  formatSettings : TFormatSettings;
  SysLocale : TSysLocale;
  pLocale : PAnsiChar;

  Locale, LocaleType: Integer;
  DefaultLocale: string;

  begin
  try
    writeln('-----------------------------------------------');
    CodeObjectId('1.3.12.2.1107.3.66.3.1');
    writeln('-----------------------------------------------');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Output In Windows:

testInt : 136
testChar : ^
testInt1 : 83
testChar1 : S

Output in Linux:

testInt : 136
testChar : ▒
testInt1 : 83
testChar1 : S

As I mentioned above, the encoded string is used later in the program to get the corresponding hexadecimal values for further processing. This hexadecimal value (the final output) is different as the encoded characters are different in Windows and Linux.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
bobix
  • 81
  • 8
  • Do I understand right that the problem is that `testChar` is printed differently for a value 136 of `testInt` that corresponds to a non-ASCII character? Then you could create a [mre] that mainly consists of `testInt := 136; testChar := ansichar(testInt); writeln('testInt : ', testInt); writeln('testChar : ', testChar);` I guess it is only a question how the terminal/console *displays* a character with code 136. – Bodo May 19 '21 at 10:58
  • To decouple your code from the system codepage you should use a self declared type like TMyAnsiString = AnsiString(1252) . – Uwe Raabe May 19 '21 at 11:06
  • Please [edit] your question and add more details about what's the purpose of "encoding the ASN1 objectId(iod) to character string". What will you do with this string? Show a few examples of OID input and expected "encoded" result. Every byte outside the ASCII printable character range (32..126 decimal or $20..$7E) might be non-portable if you have to deal with different encodings. – Bodo May 19 '21 at 11:06
  • @Bodo , yes the issue that I am facing is that the character corresponding to the decimal 136 is different in Windows which uses 'Windows-1252' character set ([link]https://en.wikipedia.org/wiki/Windows-1252) and the in Linux it is UTF8([link]https://en.wikipedia.org/wiki/ISO/IEC_8859-1). In Windows we have a character '^' corresponding to the value 136, but in UTF8 the value for 136 is undefined. And regarding the purpose of encoding ASN1 iod to string, the application which has this code uses the encodes string later in multiple places to get the hexadecimal value corresponding to each char – bobix May 19 '21 at 11:49
  • @Uwe Raabe, could you please provide more details on this approach? It would be great if you could provide any refernce to this approach. Thanks – bobix May 19 '21 at 11:53
  • This doesn't seem to be a Delphi or programming question - the issue is configuring the operating system to interpret a specific ANSI codepage. Maybe your .rc file is calling [unicode_start](https://linux.die.net/man/1/unicode_start), so whatever locale you have set up gets overridden. We don't have access to your computer so we can't tell you what you have not configured correctly. You haven't even told us what distribution you're using. – J... May 19 '21 at 12:17
  • @bobix Please [edit] your question and add all background information there. UTF8 is different from 8859-1. If the encoded result is not intended to be displayed as a string, and if other parts of the software only use the the numeric values of the bytes, then the real problem might be that the data is in fact an array of bytes which should not be interpreted as a string. Maybe you can change the data type to a dynamic `Array of Byte` or simply interpret the AnsiString as such. see also https://stackoverflow.com/a/5929189/10622916 – Bodo May 19 '21 at 12:25
  • Why don't you stop using 8 bit text encodings, other than UTF-8 of course? – David Heffernan May 19 '21 at 12:46
  • @DavidHeffernan The code that I am trying to port to linux is a very big legacy code written in Delphi a long time back, say approx 20-25 years back. So It is very difficult for me to change the encoding as it can affect other parts of the entire system. Thanks – bobix May 19 '21 at 13:08
  • I think you are just putting off the inevitable – David Heffernan May 19 '21 at 13:23
  • @bobix You wrote "*the encoded string is used later in the program to get the corresponding hexadecimal values for further processing. This hexadecimal value(the final output) is different as the encoded characters are different in Windows and Linux.*". You don't show in your question how the hexadecimal values differ, instead you show that the values `testInt` and `testInt1` are the same on Windows and Linux. Are there any other conversions/interpretations of the resulting value of type `AnsiString` involved in your program? Please create a [mre] that shows the difference. – Bodo May 19 '21 at 13:41

1 Answers1

4

The algorithm seems to be the ASN.1 encoding of an object identifier (OID) which is a series of binary data bytes that do not necessarily form a valid string.

If your application requires the interpretation of the binary data as a string then this is an error.

See https://stackoverflow.com/a/5929189/10622916 or https://learn.microsoft.com/en-us/windows/win32/seccertenroll/about-object-identifier?redirectedfrom=MSDN

As you can see from the numeric values of testInt and testInt1, the binary result of the calculation is the same (at least for the two bytes shown in the question). Only the interpretation of the binary data as a string or as characters in the system's encoding seems to be different.

In my opinion the data type AnsiString for the result of CodeObjectId is wrong (or at least misleading). It should be a dynamic Array of Byte instead. If you cannot change the data type you should at least change the interpretation of the data.

If you only want to compare the result of function CodeObjectId on different systems, I suggest to print the character codes as hexadecimal bytes instead of printing the result as a string, e.g.

  s := CodeObjectId('1.3.12.2.1107.3.66.3.1');
  // writeln(s); // wrong interpretation as a string
  for i := 1 to Length(s) do
    write(IntToHex(Ord(s[i]),2), ' ');
  writeln;

This prints

2B 0C 02 88 53 03 42 03 01 

where the bytes correspond to the input as

  • 2B = 1.3 -> 1 * 40 + 3 = 43
  • 0C = 12
  • 02 = 2
  • 88 53 = 1107
  • 03 = 3
  • 42 = 66
  • 03 = 3
  • 01 = 1

see https://onlinegdb.com/obsAGIGmJ for the full program

Bodo
  • 9,287
  • 1
  • 13
  • 29