Extending selection to Word boundary on Unicode text

Question

I am using this method like in here. I am able to extend selection to the word boundary on English like text but this behaves weirdly on Unicode text. For example, I am using Hindi like text, sometimes it selects multiple words and sometimes words far away from the cursor.

This weird behaviour is observed in both Firefox as well as Chrome. My questions is how to handle Unicode text to extend the selection to the word boundary.

JSFiddle Link

Sample Code:

function snapSelectionToWord() {
    var sel;

    // Check for existence of window.getSelection() and that it has a
    // modify() method. IE 9 has both selection APIs but no modify() method.
    if (window.getSelection && (sel = window.getSelection()).modify) {
        sel = window.getSelection();
       // if (!sel.isCollapsed) {

            // Detect if selection is backwards
            var range = document.createRange();
            range.setStart(sel.anchorNode, sel.anchorOffset);
            range.setEnd(sel.focusNode, sel.focusOffset);
            var backwards = range.collapsed;
            range.detach();

            // modify() works on the focus of the selection
            var endNode = sel.focusNode, endOffset = sel.focusOffset;
            sel.collapse(sel.anchorNode, sel.anchorOffset);
            if (backwards) {
                //sel.modify("move", "backward", "character");
                sel.modify("move", "forward", "word");
                sel.extend(endNode, endOffset);
                //sel.modify("extend", "forward", "character");
                sel.modify("extend", "backward", "word");

            } else {
                sel.modify("move", "forward", "character");
                sel.modify("move", "backward", "word");
                sel.extend(endNode, endOffset);
                sel.modify("extend", "backward", "character");
                sel.modify("extend", "forward", "word");
            }
        //}
    } else if ( (sel = document.selection) && sel.type != "Control") {
        var textRange = sel.createRange();
        if (textRange.text) {
            textRange.expand("word");
            // Move the end back to not include the word's trailing space(s),
            // if necessary
            while (/\s$/.test(textRange.text)) {
                textRange.moveEnd("character", -1);
            }
            textRange.select();
        }
    }
}

First, I don't know nothing about Hindi scripts in general nor about devanagari in particular. I think I can repro on words like `का`. But I actually have an issue on this exact word in almost any input I can find on my macOS system: trying to set the cursor by keyboard moves just like it were a single character. So if the os as some issue with this character set, that would seem normal browsers suffer from it too. Now, in Hangul there are some combining characters which together will produce a sylable like `ᄏ` and `ᅡ` together will produce `카` which are two characters. — Kaiido, Jan 22 '19 at 07:02
There, the same cursor behavior can be seen, but it can also be fixed by using the single `카` character. So once again, I don't know anything about Hindi scripts, but maybe something like this is also happening and you would have to convert all these combined characters sets to actual characters if they do exist. — Kaiido, Jan 22 '19 at 07:02

Extending selection to Word boundary on Unicode text

0 Answers0