1

I am getting a string from an API that has some sort of spacing, string2. On string2 the spaces are not regular spaces, I don't even know if they are tabs, but if I try to replace them still not equal to the spaced string1.

// This string has normal spaces charCodeAt(4) displays '32'
const string1 = 'long string with spaces'
// This string has different spaces charCodeAt(4) displays '160'
const string2 = 'long string with spaces'.replace(/\s+/g, ' ')

console.log(string1)
console.log(string2)
console.log(string1 === string2)

--- Update

The problem was that I had a mixture of normal spaces and non-breaking spaces on string1 so it will never be equal to string2 no matter how much I changed string2

Since I do have control of the string1, I have corrected it to have normal spaces and now it works.

Álvaro
  • 2,255
  • 1
  • 22
  • 48
  • that will return true not false – marcos Jan 30 '21 at 10:56
  • [This answer](https://stackoverflow.com/a/1496863) may help. – sbgib Jan 30 '21 at 10:56
  • 1
    This probably means you have some invisible character in either string (maybe in the non-ASCII range). I turned your code into a snippet and it returns true. So please provide code in your question that reproduces the issue. – trincot Jan 30 '21 at 10:58
  • @trincot, thanks I have amended it, it now says false – Álvaro Jan 30 '21 at 10:59
  • 2
    Check the `.charCode()` of the characters. You will find some "spaces" with a code of 160 which is a non-breaking space (` `) – Andreas Jan 30 '21 at 11:01
  • 1
    Indeed, inspection of string2 shows that it doesn't have regular spaces. If you would have applied the same replacement on it, the strings would have been equal. – trincot Jan 30 '21 at 11:04
  • @Andreas yeah I get `32` for the first string and `160` for the second, how can I force the second string to change the space to a normal space? – Álvaro Jan 30 '21 at 11:12
  • Oh I see `string1` has also a bad space, that is why it will never be true – Álvaro Jan 30 '21 at 11:22

3 Answers3

1

Codepoint 160 (\u00a0) is a non-breaking space.

If you don't need to support IE, you can use the Unicode property escape /\p{White_Space}+/gu as a Unicode-aware alternative to /\w+/. This will match \u00a0 along with any other whitespace character.

If you need to support IE, you can generate your own whitespace-matching regex instead, using an environment that does support Unicode property escapes. For example, running the following in the Chrome browser console:

const toUnicodeEscape = x => '\\u' + x.toString(16).padStart(4, '0')

const last = arr => arr.slice(-1)[0]

const charGroupings = [...new Array(0xffff).keys()]
    .map(k => String.fromCodePoint(k))
    .filter(x => /^\p{White_Space}+$/u.test(x))
    .map(x => x.codePointAt(0))
    .reduce((acc, n) => {
        const prev = last(acc)

        if (prev && last(prev) === n - 1) {
            prev.push(n)
        } else {
            acc.push([n])
        }

        return acc
    }, [])
    .map(x => x.length <= 2
        ? x.map(toUnicodeEscape).join('')
        : `${toUnicodeEscape(x[0])}-${toUnicodeEscape(last(x))}`)
    .join('')

new RegExp(`[${charGroupings}]+`, 'g')

Generates the regex /[\u0009-\u000d\u0020\u0085\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/g, which is exactly equivalent to /\p{White_Space}+/gu.

Lionel Rowe
  • 5,164
  • 1
  • 14
  • 27
0

I can just turn all characters with charCode with more than 126 into spacebar

function formatWhiteSpaces(string){
  return string.split('')
  .map(a=>{ if(a.charCodeAt()<127){return(a)}return(" ") }) //when charcode of something > 126 (127+), it returns strange characters.. String.fromCharCode(160) returns something looking like " " and String.fromCharCode(173 returns something looking like "")
  .join('')
}
var string1 = formatWhiteSpaces('long string with spaces')
var string2 = formatWhiteSpaces('long string with spaces')
console.log(string1)
console.log(string2)
console.log(string1 === string2)

JUST IN CASE I actually turn wanted characters into spacebar, I can make it very specific(since only String.fromCharCode(160) makes the spacebar looking thing)

function formatWhiteSpaces(string){
  return string.split('')
  .map(a=>{
    if(a.charCodeAt()==160){return(" ")} //160 similar to " "
    if(a.charCodeAt()==173){return("")}//173 similar to ""
    return(a)
  })
  .join('')
}
var string1 = formatWhiteSpaces('long string with spaces')
var string2 = formatWhiteSpaces('long string with spaces')
console.log(string1)
console.log(string2)
console.log(string1 === string2)
The Bomb Squad
  • 4,192
  • 1
  • 9
  • 17
0

If the string contain words and spaces (different), We can extract the words and rebuild the string then compare.

const cleanStr = str => [...str.matchAll(/\w+/g)].map(x => x[0]).join(' ')

// This string has normal spaces charCodeAt(4) displays '32'
const string1 = 'long string with spaces'
// This string has different spaces charCodeAt(4) displays '160'
const string2 = 'long string with spaces'

console.log(string1)
console.log(string2)
console.log(cleanStr(string1) === cleanStr(string2))
Siva K V
  • 10,561
  • 2
  • 16
  • 29
  • This will work the same for spaces `const cleanStr = str => str.replace(/\s+/g, ' ')` without having to convert the string to array – Álvaro Jan 30 '21 at 12:24