JLine the contract for NonBlockingReader seems broken

Question

Follows on from my previous question about JLine. OS: W10, using Cygwin.

def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()
terminal.enterRawMode()
// NB the Terminal I get is class org.jline.terminal.impl.PosixSysTerminal
def reader = terminal.reader()
// class org.jline.utils.NonBlocking$NonBlockingInputStreamReader

def bytes = [] // NB class ArrayList
int readInt = -1
while( readInt != 13 && readInt != 10 ) {
    readInt = reader.read()
    byte convertedByte = (byte)readInt
    // see what the binary looks like:
    String binaryString = String.format("%8s", Integer.toBinaryString( convertedByte & 0xFF)).replace(' ', '0')
    println "binary |$binaryString|"
    bytes << (byte)readInt // NB means "append to list"

    // these seem to block forever, whatever the param... 
    // int peek = reader.peek( 50 ) 
    int peek = reader.peek( 0 )

}
// strip final byte (13 or 10)
bytes = bytes[0..-2]
def response = new String( (byte[])bytes.toArray(), 'UTF-8' )

According to the Javadoc (made locally from the source) peek looks like this:

public int peek(long timeout)

Peeks to see if there is a byte waiting in the input stream without actually consuming the byte.

Parameters: timeout - The amount of time to wait, 0 == forever Returns: -1 on eof, -2 if the timeout expired with no available input or the character that was read (without consuming it).

It doesn't say what time units are involved here... I assume milliseconds, but I also tried with "1", just in case it's seconds.

This peek command is sufficiently functional as it stands for you to be able to detect multi-byte Unicode input, with a bit of time-out ingenuity: one presumes the bytes of a multi-byte Unicode character will arrive faster than a person can type...

However, if it never unblocks this means that you have to put the peek command inside a time-out mechanism which you have to roll yourself. The next character input will of course unblock things. If this is an Enter the while loop will then end. But if, say, you wanted to print a character (or do anything) before the next character is input the fact that peek's timeout doesn't appear to work prevents you doing that.

score 1 · Answer 1 · answered Apr 25 '18 at 08:06

1

JLine uses the usual java semantics: streams get bytes, reader/writer uses chars. The only piece that deals with codepoints (i.e. possible 32 bits characters in a single value) is the BindingReader. The NonBlockingReader follows the Reader semantic, simply adding some methods with a timeout that can return -2 to indicate a timeout.

If you want to do the decoding, you need to use Character.isHighSurrogate method as done by the BindingReader https://github.com/jline/jline3/blob/master/reader/src/main/java/org/jline/keymap/BindingReader.java#L124-L144

int s = 0;
int c = c = reader.read(100L);
if (c >= 0 && Character.isHighSurrogate((char) c)) {
    s = c;
    c = reader.read(100L);
}
return s != 0 ? Character.toCodePoint((char) s, (char) c) : c;

answered Apr 25 '18 at 08:06

Guillaume Nodet

311
1
5

Thanks. Sorry, I'm struggling to understand this. I think you are the main JLine author. Perhaps you might take a look at my other JLine question (https://stackoverflow.com/questions/50009594/encoding-issue-with-jline?noredirect=1&lq=1). Are you saying that JLine can handle, at most, 2-byte UTF-8 Unicode, and even then you have to use `BindingReader`? How might I use `BindingReader`...? I mean how do I get one from the `terminal` instance? – mike rodent Apr 25 '18 at 08:41
I tried wrapping a `NonBlockingReader` (NBR) in a `BindingReader`... but then I see that `BindingReader` has no `read` method, so your code above appears to apply to the wrapped `NBR`. But then `isHighSurrogate` never returns `true` when I run your code, e.g. when I enter "é". Confused! – mike rodent Apr 25 '18 at 08:51
`isHighSurrogate` will only returns `true` for characters with a value greater than `0xFFFF`. Try with `` for example. – Guillaume Nodet Apr 26 '18 at 19:03
I'm sorry, but I really don't see what you're trying to do with JLine. On the encoding problem, there may be a problem in your other question, but the `NonBlockingReader` behaves like a `Reader`. The encoding questions you're asking are the same with a plain reader. Try playing with ``` – Guillaume Nodet Apr 26 '18 at 19:09

mike rodent · Answer 2 · 2018-04-28T09:03:45.360

I have found a Cywin-specific solution to this... and also whay may be (?) the only way to intercept, isolate and identify "keyboard control" character input.

Getting correct Unicode input using JLine and Cygwin
As referenced here in my own answer to a question I asked a year ago, Cygwin (in my setup anyway) needs some sort of extra buffering and encoding, both for console input and output, if it is to handle Unicode properly.

To apply this AND to apply JLine at the same time, I do this, after going terminal.enterRawMode():

BufferedReader br = new BufferedReader( new InputStreamReader( terminal.input(), 'UTF-8' ))

NB terminal.input() returns an org.jline.utils.NonBlockingInputStream instance.

entering "ẃ" (AltGr + W in a UK Extd Keyboard) is then consumed in one br.read() command, and the int value produced is 7811, the correct codepoint value. Hurrah: a Unicode character not in the BMP (Basic Multilingual Plane) has been correctly consumed.

Handling keyboard control character bytes:
But I also want to intercept, isolate and correctly identify bytes corresponding to various control characters. TAB is one-byte (9), BACKSPACE is one-byte (127), so easy to deal with, but UP-ARROW is delivered in the form of 3 separately-read bytes, i.e. three separate br.read() commands are unblocked, even using the above BufferedReader. Some control sequences contain 7 such bytes, e.g. Ctrl-Shift-F5 is 27 (escape) followed by 6 other separately read bytes, int values: 91, 49, 53, 59, 54, 126. I haven't yet found where such sequences may be documented: if anyone knows please add a comment.

It is then necessary to isolate these "grouped bytes": i.e. you have a stream of bytes: how do you know that these 3 (or 7...) have to be interpreted jointly?

This is possible by taking advantage of the fact that when multiple bytes are delivered for a single such control character they are delivered with less that one millisecond between each. Not that surprisingly perhaps. This Groovy script seems to work for my purposes:

import org.apache.commons.lang3.StringUtils
@Grab(group='org.jline', module='jline', version='3.7.0')
@Grab(group='org.apache.commons', module='commons-lang3', version='3.7')
def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()

terminal.enterRawMode()
// BufferedReader needed for correct Unicode input using Cygwin
BufferedReader br = new BufferedReader( new InputStreamReader(terminal.input(), 'UTF-8' ))
// PrintStream needed for correct Unicode output using Cygwin
outPS = new PrintStream(System.out, true, 'UTF-8' )
userResponse = ''
int readInt
boolean continueLoop = true

while( continueLoop ) {
    readInt = br.read()
    while( readInt == 27 ) {
        println "escape"
        long startNano = System.nanoTime()
        long nanoDiff = 0
        // figure of 500000 nanoseconds arrived at by experimentation: see below
        while( nanoDiff < 500000 ) {
            readInt = br.read()  
            long timeNow = System.nanoTime()
            nanoDiff = timeNow - startNano
            println "z readInt $readInt char ${(char)readInt} nanoDiff $nanoDiff"
            startNano = timeNow
        }
    }
    switch( readInt ) {
        case [10, 13]:
            println ''
            continueLoop = false
            break
        case 9:
            println '...TAB'
            continueLoop = false
            break
        case 127:
            // backspace
            if( ! userResponse.empty ) {
                print '\b \b'
                // chop off last character
                userResponse = StringUtils.chop( userResponse )
            }
            break
        default:
            char unicodeChar = (char)readInt
            outPS.print( unicodeChar )
            userResponse += unicodeChar
    }
}
outPS.print( "userResponse |$userResponse|")
br.close()
terminal.close()

The above code enables me to successfully "isolate" the individual multi-byte keyboard control characters:

The 3 dots in the println "...TAB" line are printed on the same line, immediately after the user has pressed TAB (which with the above code is not printed on the input line). This opens the door to doing things like "autocompletion" of lines as in certain BASH commands...

Is this setting of 500000 nanoseconds (0.5 ms) fast enough? Maybe!

The fastest typists can type at 220 words per minute. Assuming an average characters per word of 8 (which seems high) this works out at 29 characters per second, or approximately 34 ms per character. In theory things should be OK. But a "rogue" pressing of two keys simultaneously might possibly mean they are pressed in less than 0.5 ms between each other... however, with the above code this only matters if both of these are escape sequences. It seems to work OK. It can't really be much less than 500000 ns according to my experiments because it can take up to 70000 - 80000 ns between each byte in a multi-byte sequence (although usually takes less)... and all sorts of interrupts or funny things happening might of course interfere with delivery of these bytes. In fact setting it to 1000000 (1 ms) seems to work fine.

NB we now seem to have a problem with the above code if we want to intercept and deal with escape sequences: the above code blocks on br.read() inside the nanoDiff while loop at the end of the escape sequence. This is OK though because we can track the bytes sequence we are receiving as that while loop happens (before it blocks).

score 1 · Accepted Answer · answered Apr 26 '18 at 19:20

1

Try playing with

 jshell> " ẃ".getBytes()
 $1 ==> byte[8] { -16, -112, -112, -73, 32, -31, -70, -125 }

 jshell> " ẃ".chars().toArray()
 $2 ==> int[4] { 55297, 56375, 32, 7811 }

 jshell> " ẃ".codePoints() .toArray()
 $3 ==> int[3] { 66615, 32, 7811 }

answered Apr 26 '18 at 19:20

Guillaume Nodet

311
1
5

An `InputStream` will return the first array, while a `Reader` will return the second one. If what you want is the third one, you need to use the `isHighSurrogate` method as I indicated. Fwiw, this has nothing to do with JLine. – Guillaume Nodet Apr 26 '18 at 19:22
Thanks. Again, I seem not to understand what JLine is for or maybe I'm using an usual setup (W10 and Cygwin). As I type in the "ẃ" character, for example, using the UK Extd keyboard, this is Alt-Gr + w. A `BindingReader` (wrapping a `NonBlockingReader`) delivers 3 `int`s from `readCharacter`: 195, 169, 225. I'd love to see an actual example where `BindingReader.readCharacter` finds an incoming character at line `c = reader.read(100L);` which tests true for `isHighSurrogate`. – mike rodent Apr 26 '18 at 20:20
But you say that this is not what JLine is for ... I simply don't understand. What I want to do is correctly read Unicode console input, character by character. – mike rodent Apr 26 '18 at 20:21
That will be the case if you use `` as `Character.isHighSurrogate((char) 55297) == true`. There may be some encoding problems in JLine, or more probably, the wrong encoding is used by JLine. As for your usage of JLine... I'm not sure to understand what you're trying to achieve, so I'd rather try to separate the encoding problem (try to use `System.in` and an `InputStreamReader`) from the JLine problem (and try to explain what you're trying to do). – Guillaume Nodet Apr 26 '18 at 20:59
Aha... this *may* be related to a question I asked and then answered a year ago relating to Cygwin and encoding: https://stackoverflow.com/a/41945986/595305. What I'm trying to achieve: simply read Unicode input correctly: it seems likely that I'm getting different input to you because I'm using Cygwin... NB tomorrow I'll try to make a version of your `BindingReader` that somehow incorporates `BufferedReader`... (or vice versa). – mike rodent Apr 26 '18 at 23:58
Yes, the correct encoding need to be used , it can be specified on the TerminalBuilder. On the Jline usage, please explain what you’re trying to do, I can point you in the right direction, but a BufferedReader won’t help here I think. – Guillaume Nodet Apr 27 '18 at 08:41
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/169939/discussion-between-guillaume-nodet-and-mike-rodent). – Guillaume Nodet Apr 27 '18 at 10:51

JLine the contract for NonBlockingReader seems broken

3 Answers3

Linked