1

I discovered this problem in TSR's post.

what is at first glance an equality is distorted by the fact that there are different types of spaces in HTML (and in editing) like non-breaking spaces, half-quadratin spaces, etc.

Which finally introduces a kind of logical trap for a simple equality test.
Because the equality test [for a computer] is limited to checking the encodings of the sequence of characters, whereas for a human being this difference in the encoding of spaces is irrelevant.
It is not a question of mathematics but of common language understanding.

I solved it this way, but not being too sure of my own, I also wonder if there could not be better to do, or if there is not already a native function in JS for that (researched but not found) and my mastery of regular expressions is far from perfect

const isTextEqual=(a,b)=>(a.replaceAll(/\s/g,' ')===b.replaceAll(/\s/g,' '))

var s1  = "USD 1,234.12"
var s1b = "USD   1,234.12"
var s2  = "USD 1,234.12"

console.log('s1 == s2 ->', s1 == s2) // is false because :
console.log(s1.codePointAt(3), s2.codePointAt(3));  // 32 , 160 is non breaking space

console.log('isTextEqual(s1,s2)->',  isTextEqual(s1,s2) ) // true (as expected)
console.log('isTextEqual(s1b,s2)->',  isTextEqual(s1b,s2) ) // false (as expected ?)

do you have a better way to encode this type of equality on a text, which allows to ignore the encoding of the space characters?

Mister Jojo
  • 20,093
  • 6
  • 21
  • 40
  • 3
    "which ultimately finally introduces a sort of logical loophole for a simple equality test." -- not really, the strings really aren't equal. _Looking equal_ to a human eye is different than equality, which is a character-by-character comparison of the two strings, regardless of whether the character is visible or not. – ggorlen Apr 11 '21 at 18:30
  • @ggorlen I wrote a **"sort of"**, and we must admit that TSR fell into this trap – Mister Jojo Apr 11 '21 at 18:33
  • I don't see this as a "trap" or a "distortion". You seem to presuppose computers should use "first glance" (human vision) to determine equality. `===` is not broken or misbehaving simply because two strings look visibly to be the same. I don't recommend trying to "fix" `===` and introducing new kinds of confusing behavior. Your `isEqual` implementation above thinks `"a b"` is "equal" to `"a\nb"` which makes no sense. The trap TSR fell into is trusting their eyes instead of JavaScript. – ggorlen Apr 11 '21 at 18:37
  • 2
    You don't need to use `replaceAll`: `function isEqual(a,b){return a.replace(/\s/g,'')===b.replace(/\s/g,'');}` works as well. – Unmitigated Apr 11 '21 at 18:38
  • Does this answer your question? [Remove zero-width space characters from a JavaScript string](https://stackoverflow.com/questions/11305797/remove-zero-width-space-characters-from-a-javascript-string) – ggorlen Apr 11 '21 at 18:41
  • See also [Consider displaying zero-width space characters in code blocks](https://meta.stackoverflow.com/questions/351807/consider-displaying-zero-width-space-characters-in-code-blocks) – ggorlen Apr 11 '21 at 18:47
  • @ggorlen `==` equality test doesn't work better – Mister Jojo Apr 11 '21 at 21:51
  • @user14063792468 The question is in the tittle : how to do equality tests without considering the different types of spaces in a string? – Mister Jojo Apr 11 '21 at 22:15
  • 1
    Visual equality is indeed sometimes useful, but it's hard, in many ways you haven’t yet gotten to. For example, there are many lookalike characters other than spaces, there are multiple byte sequences that can represent exactly the same character, you can even write the string backwards then reverse it, all in Unicode. I recommend trying to find a sufficient subset of this problem to solve rather than solve it generally, for example by requiring the same input format, or pulling out only the number then comparing that. In TSR’s case, the solution is to rewrite the test to the expected output. – twhb Apr 11 '21 at 22:25
  • 1
    "*it does not take into account the number of consecutive spaces*" - do you expect it to take into account different numbers of spaces, or to ignore them (and treat `" "` as equal to `" "`)? For the latter, use `.replace(/\s+/g, ' ')` – Bergi Apr 11 '21 at 22:32
  • @Bergi you are right, I was confused – Mister Jojo Apr 11 '21 at 23:40
  • @twhb I agree, but for this case, the question is only asked for spaces, not for other characters (with or without diacritics for example) – Mister Jojo Apr 11 '21 at 23:46

0 Answers0