2

I'm facing a silly but annoying situation, when trying to display UTF 8 text in a Delphi XE7 console application. It seems the ReadLn command only reads the correct UTF 8 characters after a second try. For example:

    program ConsTest;

    {$APPTYPE CONSOLE}

    {$R *.res}

    uses
      System.SysUtils,
      System.Classes,
      WinApi.Windows;

    var
      CurrentCodePage: Integer;
      Command: String;
      Running: Boolean;
      MyText: String;

    begin
      CurrentCodePage := GetConsoleOutputCP;
      SetConsoleOutputCP(CP_UTF8);
      SetTextCodePage(Output, CP_UTF8);

      MyText := 'Olá mundo.';
      WriteLn(MyText);

      Running := True;
      while Running do
      begin
        ReadLn(Command);
        WriteLn(Command);
        if (Command = '/q') then
          Running := false;
      end;

      SetConsoleOutputCP(CurrentCodePage);
      SetTextCodePage(Output, CurrentCodePage);
    end.

In the example above, just after I run the application, if I enter the following text:

'Olá mundo.'

The WriteLn will show:

'Ol mundo.'

Subsequently to the first pass, all UTF-8 characters read by the ReadLn command are being displayed ok. Is there any problem with this code? I tried to search for more details in the web, but I didn't find any information related to this. The call "WriteLn(MyText);" at the beginning of the code, shows the text 'Olá mundo.' correctly.

Arioch 'The
  • 15,799
  • 35
  • 62
André Murta
  • 141
  • 2
  • 9
  • 3
    can you close and reset Input file after switching windows console to UTF8 ? Read/Write have their own internal buffer that might be pre-filled pre-converted to UCS-2 before you switch the codepage – Arioch 'The Jun 08 '15 at 12:27
  • also I think `SetTextCodePage` should be called on `Input` as well as an `Output` Actually that might be better or worse than doing `CloseFile/Reset` over `Input` – Arioch 'The Jun 08 '15 at 12:31
  • 2
    Related: http://stackoverflow.com/questions/26255148/is-writeln-capable-of-supporting-unicode – David Heffernan Jun 08 '15 at 12:32
  • I think @TLama was correct to point at `SetConcoleCP` function https://msdn.microsoft.com/ru-ru/library/windows/desktop/ms686013.aspx - however whether it would work with UTF-8 and in which Windows versions is left for trial-and-error: https://groups.google.com/forum/#!topic/microsoft.public.win32.programmer.international/44qpI6MsIPk OTOH there are alterative consoles like MSYS, 4NT/TakeW32, Cygwin, maybe PowerShell... – Arioch 'The Jun 08 '15 at 12:37
  • LU RD, David the topic starter talks he has problems with INPUT not OUTPUT /// Andre, I think for debugging oen better add the check `if MyText=Command then writeln ('yes') else writeln('no');` right after the ReadLN – Arioch 'The Jun 08 '15 at 12:39
  • @Arioch You can't see the connection. That if output is broken then input might also be. – David Heffernan Jun 08 '15 at 12:52
  • Try calling [`SetConsoleCP`](https://msdn.microsoft.com/en-us/library/windows/desktop/ms686013%28v=vs.85%29.aspx) – Ondrej Kelle Jun 08 '15 at 13:04
  • @Arioch I can also see multiple calls to Writeln. – David Heffernan Jun 08 '15 at 13:08
  • @DavidHeffernan I also see few variables and a while loop there, so what? /// yep, indeed, two DIFFERENT file variables opened for DIFFERENT file handles and used by DIFFERENT set of RTL function - indeed, if they are related by some weird side-effects, then I guess it should be shown. Assumption #0 is that only related variables and functions are related, not ReadLn(Input) and WriteLn(Output). – Arioch 'The Jun 08 '15 at 13:18
  • @Arioch I tried this application under Windows 7 and Windows 8.1 – André Murta Jun 08 '15 at 13:20
  • @AndréMurta the problem is which would be the OS range at your clients' sites... I guess Delphi console app might be launched on Win2000 if not NT4 (casting aside ReactOS and Linux), so compatibility would be limited by your clients configs, not by your development machine's one – Arioch 'The Jun 08 '15 at 13:22
  • @AndréMurta but at least like TLama and me said above, try to changing codepage for input-console and input-variable as well s you do for output- ones. And add the abovementioned string comparison to check in-between ReadLn and WriteLn so you would know which of two functions actually fails. – Arioch 'The Jun 08 '15 at 13:26
  • @Arioch never mind, I can't face this argument. Especially as your latest comment accords with what I said!!! – David Heffernan Jun 08 '15 at 13:26
  • @Arioch I'm running this application under Windows 7 and Windows 8.1 using the Windows console (cmd.exe), I followed your suggestion and tried it under cygwin. The result was the one expected, all UTF-8 chars are being displayed ok from the beginning. It seems this situation only happens under the Windows standard console. – André Murta Jun 08 '15 at 13:27
  • I wonder if someone bothered to make http://delphicrt.sourceforge.net/ Unicode-aware... Seems no one did. – Arioch 'The Jun 08 '15 at 13:27
  • @DavidHeffernan the difference is in me being saying "TS, do THIS to check if your problem is really where you think and said it is, not in another function" and you say "TS, your problem is in another place, not the one you told us" - very different approaches, don't you think? I help him to extend search range,. you just discard his words for no reasons at all. Do you even see the WriteLN(const) function being executed before any Readln is called ever ??? – Arioch 'The Jun 08 '15 at 13:31
  • @AndréMurta still I suggest you'd try to switch input-console to UTF-8 and/or close and reopen `input` file var in your app and check the results in stock console in different Windows versions – Arioch 'The Jun 08 '15 at 13:32
  • @Arioch'The I did not say that. I provided a link to a related question. Please re-read my comment. – David Heffernan Jun 08 '15 at 13:35
  • @DavidHeffernan that link I upvoted, as a bonus info, but there was/were more comment(s) now deleted – Arioch 'The Jun 08 '15 at 13:36
  • @Arioch'The I deleted nothing. Anyway, my view here is that no amount of furkling with code pages gets the job done. You need to use the Unicode console API. – David Heffernan Jun 08 '15 at 13:36
  • @Arioch'The At the question I linked to, although I accepted LURD's answer, I would not use that code. I would use `WriteConsoleW`. And for reading, `ReadConsoleW`. – David Heffernan Jun 08 '15 at 13:38
  • that depends what Delphi RTL is using, if it uses Unicode Concole API (as it should), then it should work. I not maybe he'd fork DelphiCRT and enhance it... /// another thing is that he does not use Unicode ranges there, his symbols perfectly fit into standard default 437/1250/1252 codepages. Actually he does not even need Unicode for his specific test – Arioch 'The Jun 08 '15 at 13:39
  • @DavidHeffernan again, if all he wants is reading the whole string all at once, without parsing into several vars, then ReadConsoleW shouldbe enough, and I guess TTextReader too - http://stackoverflow.com/questions/2815839 – Arioch 'The Jun 08 '15 at 13:41
  • 1
    @Arioch Using the same code I gave as example, if I try the test you suggested (if MyText=Command then writeln ('yes') else writeln('no');). It evaluates false on the first pass and true in all subsequent passes. If I change the input code page to UTF8, it always fails. – André Murta Jun 08 '15 at 14:02
  • Actually, if I remove all the calls to SetTextCodePage(Output, ...); the program has the same behaviour. That's really annoying. – André Murta Jun 08 '15 at 14:06

1 Answers1

0

Ok, after pay a bit more attention to the first comment from Arioch, I tried the code below that works perfectly on any console.


    program ConsTest;

    {$APPTYPE CONSOLE}

    {$R *.res}

    uses
      System.SysUtils,
      System.Classes,
      WinApi.Windows;

    var
      Command: String;
      Running: Boolean;
      MyText: String;

    begin
      MyText := 'Olá mundo.';
      WriteLn(MyText);

      Reset(Input); //*** That's the catch. ***

      Running := True;
      while Running do
      begin
        ReadLn(Input, Command);
        if (MyText = Command) then
          WriteLn('Yes')
        else
          WriteLn('No');

        WriteLn(Command);
        if (Command = '/q') then
          Running := false;
      end;
    end.

*OBS: The code above does not work for certain alphabets, I need a better understanding about Unicode and the console mode in Delphi. Due to certain coincidences, it solved my problem but cannot be considered actually an answer. Anyway, the call to the function "Reset(Input)" seems to be necessary in order to make sure the first call to ReadLn will work properly.

André Murta
  • 141
  • 2
  • 9
  • Your question was about Unicode, specifically UTF-8. I sense confusion. – David Heffernan Jun 08 '15 at 15:12
  • @Arioch. Actually the "misbehavior" was just under Windows console. It always worked on Cygwin. I'm used to develop web applications using PHP, in that language, I have to change the code set to UTF8 in order to correctly display the portuguese characters on the web browser. I did not realize until after this post, that the XE7 string type is UNICODE by default and it's not necessary to change a thing about the console input/output, "mea culpa, mea maxima culpa", I'm really sorry. Thanks for your support. – André Murta Jun 08 '15 at 15:41
  • Well, as soon as you leave the ANSI code page then you'll hit trouble. Delphi Writeln/Readln don't support Unicode. This answer doesn't seem to me to match the question that you asked. – David Heffernan Jun 08 '15 at 17:04
  • @ David Ok David, forgive my ignorance but I need more info about this one. When I faced the problem I described above, I thought it was happening due to some misconfiguration between my program and the Windows console code page. -> – André Murta Jun 08 '15 at 18:18
  • @David Then, I discovered that in XE7 the string type is actually an alias for UnicodeString, my understanding was that XE7 is already dealing with the extended portuguese characters like áéíóúãõàçÇÁÉÍÓÚ... and I don't have to worry about it. Actually, the code I posted as my answer works fine under the Windows Console and Cygwin for any of these extended chars I have to deal with, and I did not have to make any change in the Windows console configs. -> – André Murta Jun 08 '15 at 18:19
  • @David If I understood your last comment, you are telling me the ReadLn and WriteLn commands do not support Unicode and sometime, somehow I could face problems because I'm using these commands in my code. If that's the case, what should be the replacement for WriteLn and ReadLn? Also, regarding your comment about the ANSI code page, are you suggesting that I should keep the calls to SetTextCodePage in order to enforce the console to show the correct chars? (over) – André Murta Jun 08 '15 at 18:19
  • Well, try reading and writing some text outside your ANSI code page. Perhaps some Russian or Greek or Chinese. Make sure your console font can handle it. As to how to deal with Writeln and Readln I'd probably opt for the win32 api calls I mentioned in earlier comments to the Q. – David Heffernan Jun 08 '15 at 18:22
  • @David Ok... I did some "cut & paste" and I think I'm starting to understand what you mean. That's puzzling (at least for me)... The code below is actually a problem: Writeln('АБВГДЕЖЅZЗИІКЛМНОПҀРСТȢ'); But this one is not: Writeln('ÁÉÍÓÚÃÕÇÀÈÌÒÙáéíóúãõçàèìòù'); Once my app needs only to deal with the portuguese chars, what I have done is ok for me, but I would like to understand better what are the differences between the russian and portuguese chars. Do these alphabets have a different classification in Unicode? – André Murta Jun 08 '15 at 18:42
  • @David Are the portuguese chars being displayed in my console because the Windows I'm using has the portuguese language pack installed additionally to the english language pack? – André Murta Jun 08 '15 at 18:42
  • Have you selected the Consolas font in the console? – LU RD Jun 08 '15 at 18:48
  • Delphi's console library code uses the ANSI code page. I don't have the stamina to do this in comments to the answer to your question. – David Heffernan Jun 08 '15 at 18:50
  • The cyrillic and portuguese chars are displayed ok using either "consolas" or "lucida console" as the Windows console font. For the greek chars I have to change the font to Consolas. – André Murta Jun 08 '15 at 18:59
  • Andre, as was suggested in the comments to the question, you can try to use ReadConsoleW windows API directly and I think you can try top use TTextReader if you woud open TFileStream over std-in (#0) file handle, and perhaps you can download DelphiCRT lib and port it to Unicode Delphi, it may or may not be trivial – Arioch 'The Jun 09 '15 at 09:53