1

When I run the program, I see the following:

λ ashot Weather → λ git master* → stack exec Weather-exe
Пожалуйста, укажите дату для прогноза в формате ГГГГ-ММ-ДД:
2018-11-07
Пожалуйста, укажите один из этих марзов: [Aragatsotn,Ararat,Armavir,Dilijan,Gegharkunik,Gyumri,Kotayk,Shirak,Syunik,Vanadzor,Yerevan]
Yerevan 

Everything is working. But when I enter something wrong I get it:

InvalidDate "\1058\1077\1082\1089\1090, \1082\1086\1090\1086\1088\1099\1081 \1074\1099 \1074\1074\1077\1083\1080 - \1101\1090\1086 \1082\1072\1082\1072\1103-\1090\1086 \1073\1077\1083\1080\1073\1077\1088\1076\1072!"

But instead of these figures 1072/1052 ... I should receive the text in Russian. How can I fix this so that Russian characters are displayed correctly?

tripleee
  • 175,061
  • 34
  • 275
  • 318
juice
  • 9
  • 1

1 Answers1

1

You are probably calling show on strings. This may also happen implicitly, e.g. by print, or by asking GHCi to print a string.

Consider this GHCi session:

> str = "Пожалуйста"                                           
> str                                                          
"\1055\1086\1078\1072\1083\1091\1081\1089\1090\1072"                  

The variable str contains the right string. When we ask GHCi to print it, it calls print str implicitly, which calls show.

show, in turn, converts a string into a ASCII escaped form, turning all non-printable characters into escape sequences.

I stress that the string str is indeed the intended string: we can print it correctly if we use putStrLn, for instance:

> putStrLn str
Пожалуйста

Note that if you call show on a data type which contains a String inside (e.g. inside a constructor like InvalidDate), then that will in turn call show on the string, producing the unwanted escapes.

I would suggest to write a custom pretty-printer for your type and use that, so that strings are not mangled by this escaping.

chi
  • 111,837
  • 3
  • 133
  • 218
  • 2
    That behaviour of the `Show` instance is due for an overhaul... I don't think Unicode characters should pose much of a problem anywhere anymore. – leftaroundabout Nov 07 '18 at 19:58
  • @leftaroundabout I agree. Escapes should only be used for non printable or otherwise "funny" characters. (During debugging it's still useful to see the escapes, sometimes, to distinguish between similar but different chars) – chi Nov 07 '18 at 20:05
  • 1
    At the very least, it would be nice for such a function (one that escapes only the "funny" characters) to exist in `base`. – Alec Nov 07 '18 at 20:15
  • @leftaroundabout printing non-ascii characters on Windows using the functions in `System.IO` is still dangerous. – Jeremy List Nov 28 '18 at 21:44
  • @JeremyList the functions from `System.IO` just encode string to UTF-8. That should be safe on any system with 8-bit bytes, i.e. on any system. Whether these characters can actually be _displayed_ is another question, but it should at any rate be safe to try. If the Windows console for some reason can't handle it (which would be utterly pathetic), that means it is broken and shouldn't be used. – leftaroundabout Nov 28 '18 at 23:21
  • @leftaroundabout `System.IO` on Windows doesn't just use UTF-8: it checks the encoding of the associated windows console and uses that; by default throwing an exception on any unsupported character (`hSetEncoding` and friends are needed to avoid this) – Jeremy List Nov 29 '18 at 00:06
  • Ok, so what's unsafe then? – leftaroundabout Nov 29 '18 at 00:09
  • The exceptions make it unsafe. – Jeremy List Nov 29 '18 at 00:16
  • There's nothing really worrying about an exception in GHCi. – leftaroundabout Nov 29 '18 at 00:46
  • GHCi raises an exception since the windows console can't handle the unicode chars. This should not happen for an output message. I can't blame GHCi for this: it has no alternative. Instead, the windows console should handle unicode more gracefully: as it is, it is broken. `chcp 65001` handles some unicode (even if it prints garbage for non ascii chars), and should be the default encoding for the console. Some crashes still happen, though. WinGHCi is similarly broken. The only almost sane option seems to be ConEmu (non-wide-char output is ok, keyboard input still broken). – chi Nov 29 '18 at 09:23