Invert string's lexicographic order value

Question

Given that there is a lexicographically-sorted list of UTF-8 string s1, s2, s3, ... of unknown length, is it possible to invert each string value, such that when the list is sorted lexicographically again using the inverted value, the reverse order is now produced?

function invert(s) {
    // TODO: what's here?
    return s;
}

const sample = ['', ' ', 'a', 'A', '@', '한','자', '한자', '자한'];

const original = [...sample].sort((a, b) => {
    return a.localeCompare(b);
});

const inverted = [...sample].sort((a, b) => {
    return invert(a).localeCompare(invert(b));
});

// Both should print the same.
console.log('original', original);
console.log('inverted.reverse', inverted.reverse());

Did you really mean to use `localeCompare` (and if yes, which locale are you using?), or did you mean to compare the string by individual charcodes (or codepoints)? — Bergi, Jul 23 '23 at 04:48
@Bergi I guess I mean to compare the string by individual charcodes--didn't think this through properly. — Code, Jul 23 '23 at 11:10

Bergi · Accepted Answer · 2023-07-23T12:40:32.100

I mean to compare the string by individual charcodes

Then you'll need to use a normal comparison instead of .localeCompare. But yes, it's possible then to invert the string by inverting every individual charcode. This won't result in readable strings and not in valid UTF-8 either, but can be used for the comparison.

function invert(s) {
    return String.fromCharCode(...s.split('').map(c =>
        0xFFFF - c.charCodeAt(0)
    ), 0xFFFF);
}

const sample = ['', ' ', 'a', 'A', '@', '한','자', '한자', '자한'];
console.log(sample.map(invert));

const original = [...sample].sort((a, b) => {
    return +(a>b)-(a<b);
});

const inverted = [...sample].sort((a, b) => {
    return +(invert(a)>invert(b))-(invert(a)<invert(b));
});

// Both should print the same.
console.log('original', original);
console.log('inverted.reverse', inverted.reverse());

To avoid shorter strings still being sorted first, we append \uFFFF. In theory this would have to be an infinite string - see An analogue for (-/+)Infinity for characters in JavaScript. For simplicity, I just assume that no strings end in \u0000 :-)

To do it properly, you could double the size of each string so that there is enough coding space to construct an "end of string" mark that is larger than any "normal" character:

function invert(s) {
    return s.split('').map(c =>
        ' ' + String.fromCharCode(0xFFFF - c.charCodeAt(0))
    ).join('') + '$';
}

const sample = ['', '\u0000', '\u0000\u0000', '.', '.\u0000', '.\uFFFF', 'a', 'A', '@', '한','자', '한자', '자한'];
console.log(sample.map(invert));

const original = [...sample].sort((a, b) => {
    return +(a>b)-(a<b);
});

const inverted = [...sample].sort((a, b) => {
    return +(invert(a)>invert(b))-(invert(a)<invert(b));
});

// Both should print the same.
console.log('original', original);
console.log('inverted.reverse', inverted.reverse());

Is an "infinite string" required to support `\u0000`? Would it work if the strings are first encoded to a smaller charset (e.g. `a` becomes `00072` and `\u0000` becomes `00000`)? — Code, Jul 23 '23 at 12:37
@Code I tried to find a way where still every character is mapped to one other character, just with some clever subtraction of numerical values, but couldn't come up with a solution. I found a way to do it using longer strings, though — Bergi, Jul 23 '23 at 12:42

Invert string's lexicographic order value

1 Answers1