4

When running the following code

do line <- getLine
   putStrLn line

or,

getLine >>= putStrLn

And, after

 getLine >>= putStrLn

entering

µ

one encounters this output:

Now, I already tried chcp 65001 beforehand, which doesn't work, and the encoding of stdin is utf8.

An examination without putStrLn shows:

 getLine
µ
'\NIL'

My environment:
Windows 10 Version 10.0.17134 Build 17134
Lenovo ideapad 510-15IKB
BIOS Version LENOVO 3JCN30WW
GHCi v 8.2.2

How can this be solved?

EDIT: Specifically, the following sequence of actions causes this:

  1. Open cmd
  2. Type chcp 65001
  3. Type ghci
  4. Type getLine >>= putStrLn
  5. Type µ

However, the following does not:

  1. Search for ghci
  2. Open ghci.exe at %PROGRAMS%\Haskell Platform\8.2.2\bin
  3. Repeat 4-5.

NOTE: %PROGRAMS% is not a real environment variable.

EDIT: As requested, the output of GHC.IO.Encoding.getLocaleEncoding:

UTF-8

Also, the output of System.IO.hGetEncoding stdin:

Just UTF-8

(when using chcp 65001)

EDIT: The character is U+00B5. I am using a German keyboard, system locale Germany, language setting English, Keyboard language ENG with German layout.

schuelermine
  • 1,958
  • 2
  • 17
  • 31
  • 5
    This is on Windows? – that other guy Aug 02 '18 at 20:14
  • 6
    This works as expected for me. Could you provide more details about your environment? – DarthFennec Aug 02 '18 at 20:16
  • MacOS X with ghc 8.0.2 works as expected. – pedrofurla Aug 03 '18 at 04:21
  • 1
    Are you sure the stdin you’re giving is utf8 and not utf16? And is that letter a mu (U+03BC) or a micro sign (U+00B5)? – Dan Robertson Aug 03 '18 at 08:02
  • 1
    In the faulty `ghci` session, what does `GHC.IO.Encoding.getLocaleEncoding` say? – Daniel Wagner Aug 03 '18 at 14:15
  • You may want to search the GHC Trac for related tickets. I imagine you're not the first to encounter such a problem. If you can't find something, open a ticket. FYI: At the moment, the go-to expert on Windows-specific problems seems to be Tamar Christina. He usually uses the handle Phyx or Phyx- online. – dfeuer Aug 03 '18 at 18:41
  • 1
    @Mark Neu, just in case you are really curious to see a workaround, I made an edit to my answer with a minimal working solution. – lehins Aug 06 '18 at 23:22

1 Answers1

4

Console input/output is utterly broken on Windows and has been for some time now. Here is the top ticket that tracks all the issues related to IO on Windows: https://ghc.haskell.org/trac/ghc/ticket/11394

I believe, these two tickets describe best the behavior that you are experiencing:

The only work around right now is to manually use Windows API for dealing console output/input, which is a pain of its own.

EDIT

So, just for the hell of it I decided to endure some of that pain. :)

Here is the output of the code below:

====
Input: µ
Output: µ
====

This is by no means a fully correct or a safe solution, but it does work:

module Main where

import Control.Monad
import System.IO
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.String
import Foreign.C.Types
import Foreign.Storable

import System.Win32
import System.Win32.Types
import Graphics.Win32.Misc

foreign import ccall unsafe "windows.h WriteConsoleW"
  c_WriteConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL

foreign import ccall unsafe "windows.h ReadConsoleW"
  c_ReadConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL

-- | Read n characters from a handle, which should be a console stdin
hwGetStrN :: Int -> Handle -> IO String
hwGetStrN maxLen hdl = do
  withCWStringLen (Prelude.replicate maxLen '\NUL') $ \(cstr, len) -> do
    lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
    withHandleToHANDLE hdl $ \winHANDLE ->
      withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \lpNumberOfCharsRead -> do
        c_ReadConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsRead nullPtr
        numWritten <- peek lpNumberOfCharsRead
        peekCWStringLen (cstr, fromIntegral numWritten)

-- | Write a string to a handle, which should be a console stdout or stderr.
hwPutStr :: Handle -> String -> IO ()
hwPutStr hdl str = do
  void $ withCWStringLen str $ \(cstr, len) -> do
    lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
    withHandleToHANDLE hdl $ \winHANDLE ->
      withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \ lpNumberOfCharsWritten ->
      c_WriteConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsWritten nullPtr

main :: IO ()
main = do
  hwPutStr stdout "====\nInput: "
  str <- hwGetStrN 10 stdin
  hwPutStr stdout "Output: "
  hwPutStr stdout str
  hwPutStr stdout "====\n"

EDIT 2

@dfeuer asked me to list things that are unsafe, incorrect or incomplete with that answer. I only really code on Linux, so I am not a Windows programmer, but here are the things that pop into my mind that would need to be changed before that code could be used in a real program:

  • The most important part is that code will work only with console handles, which can be determined by GetConsoleMode API call.
  • For other type of handles the code above will do nothing, eg. if used with pipes or file handles, which has its own issues with encoding, but that is a totally separate issue.
  • API call failures aren't accounted for. So we'd have to check if a call was successful by looking at the returned BOOL, and whenever it's not use GetLastError to report the error back to the user.
  • Functions implemented above are very limited, there are no checks on how much they've actually read/wrote to/from buffer. For that reason hwGetStrN can only handle n characters, so recursive call would be required in order to get behavior similar to hGetLine
  • Do all the sanity checks, eg. DWORD is Word32, so fromIntegral len call is susceptible to integer overflow, which is both incorrect and unsafe.
  • FFI calls must be stdcall on 32bit OS, while ccall for x86_64, so some CPP is necessary
lehins
  • 9,642
  • 2
  • 35
  • 49
  • Can you point out in your answer what specifically is likely to be incomplete, incorrect, or unsafe, if you have an inkling? – dfeuer Aug 07 '18 at 00:42
  • 1
    @dfeuer, This sort of functionality should certainly in `base`, and I can't recommend using the sample code I provided. Regardless of that I've updated the answer with the things I can think of that would have to be fixed if that code is to be used anywhere. – lehins Aug 07 '18 at 09:51
  • Ah, look, with a little bit of searching I found a similar, but more extensive solution proposed 6 years ago :) https://stackoverflow.com/questions/10779149/unicode-console-i-o-in-haskell-on-windows – lehins Aug 07 '18 at 22:10