I have found a Cywin-specific solution to this... and also whay may be (?) the only way to intercept, isolate and identify "keyboard control" character input.
Getting correct Unicode input using JLine and Cygwin
As referenced here in my own answer to a question I asked a year ago, Cygwin (in my setup anyway) needs some sort of extra buffering and encoding, both for console input and output, if it is to handle Unicode properly.
To apply this AND to apply JLine at the same time, I do this, after going terminal.enterRawMode()
:
BufferedReader br = new BufferedReader( new InputStreamReader( terminal.input(), 'UTF-8' ))
NB terminal.input()
returns an org.jline.utils.NonBlockingInputStream
instance.
entering "ẃ" (AltGr + W in a UK Extd Keyboard) is then consumed in one br.read()
command, and the int
value produced is 7811, the correct codepoint value. Hurrah: a Unicode character not in the BMP (Basic Multilingual Plane) has been correctly consumed.
Handling keyboard control character bytes:
But I also want to intercept, isolate and correctly identify bytes corresponding to various control characters. TAB is one-byte (9), BACKSPACE is one-byte (127), so easy to deal with, but UP-ARROW is delivered in the form of 3 separately-read bytes, i.e. three separate br.read()
commands are unblocked, even using the above BufferedReader
. Some control sequences contain 7 such bytes, e.g. Ctrl-Shift-F5 is 27 (escape) followed by 6 other separately read bytes, int
values: 91, 49, 53, 59, 54, 126. I haven't yet found where such sequences may be documented: if anyone knows please add a comment.
It is then necessary to isolate these "grouped bytes": i.e. you have a stream of bytes: how do you know that these 3 (or 7...) have to be interpreted jointly?
This is possible by taking advantage of the fact that when multiple bytes are delivered for a single such control character they are delivered with less that one millisecond between each. Not that surprisingly perhaps. This Groovy script seems to work for my purposes:
import org.apache.commons.lang3.StringUtils
@Grab(group='org.jline', module='jline', version='3.7.0')
@Grab(group='org.apache.commons', module='commons-lang3', version='3.7')
def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()
terminal.enterRawMode()
// BufferedReader needed for correct Unicode input using Cygwin
BufferedReader br = new BufferedReader( new InputStreamReader(terminal.input(), 'UTF-8' ))
// PrintStream needed for correct Unicode output using Cygwin
outPS = new PrintStream(System.out, true, 'UTF-8' )
userResponse = ''
int readInt
boolean continueLoop = true
while( continueLoop ) {
readInt = br.read()
while( readInt == 27 ) {
println "escape"
long startNano = System.nanoTime()
long nanoDiff = 0
// figure of 500000 nanoseconds arrived at by experimentation: see below
while( nanoDiff < 500000 ) {
readInt = br.read()
long timeNow = System.nanoTime()
nanoDiff = timeNow - startNano
println "z readInt $readInt char ${(char)readInt} nanoDiff $nanoDiff"
startNano = timeNow
}
}
switch( readInt ) {
case [10, 13]:
println ''
continueLoop = false
break
case 9:
println '...TAB'
continueLoop = false
break
case 127:
// backspace
if( ! userResponse.empty ) {
print '\b \b'
// chop off last character
userResponse = StringUtils.chop( userResponse )
}
break
default:
char unicodeChar = (char)readInt
outPS.print( unicodeChar )
userResponse += unicodeChar
}
}
outPS.print( "userResponse |$userResponse|")
br.close()
terminal.close()
The above code enables me to successfully "isolate" the individual multi-byte keyboard control characters:
The 3 dots in the println "...TAB"
line are printed on the same line, immediately after the user has pressed TAB (which with the above code is not printed on the input line). This opens the door to doing things like "autocompletion" of lines as in certain BASH commands...
Is this setting of 500000 nanoseconds (0.5 ms) fast enough? Maybe!
The fastest typists can type at 220 words per minute. Assuming an average characters per word of 8 (which seems high) this works out at 29 characters per second, or approximately 34 ms per character. In theory things should be OK. But a "rogue" pressing of two keys simultaneously might possibly mean they are pressed in less than 0.5 ms between each other... however, with the above code this only matters if both of these are escape sequences. It seems to work OK. It can't really be much less than 500000 ns according to my experiments because it can take up to 70000 - 80000 ns between each byte in a multi-byte sequence (although usually takes less)... and all sorts of interrupts or funny things happening might of course interfere with delivery of these bytes. In fact setting it to 1000000 (1 ms) seems to work fine.
NB we now seem to have a problem with the above code if we want to intercept and deal with escape sequences: the above code blocks on br.read()
inside the nanoDiff
while
loop at the end of the escape sequence. This is OK though because we can track the bytes sequence we are receiving as that while
loop happens (before it blocks).