9

I am encountering a strange issue while comparing two strings. Here is my code:

console.log(x == y);
console.log("'" + x + "'=='" + y + "'");
console.log(typeof(x));
console.log(typeof(y));

In the console, I have :

false 
'1Ä4±'=='1Ä4±' 
string
string

I guess my strings contain strange characters, so how should I compare them? I read Javascript string comparison fails when comparing unicode characters but in my case, x and y come from the same source and have the same encoding.

Community
  • 1
  • 1
little-dude
  • 1,544
  • 2
  • 17
  • 33

4 Answers4

6

The Ä in your strings can be represented either as a single UNICODE character (Latin Capital Letter A With Diaeresis, U+00C4), or as a composite character consisting of Latin Capital Letter A (U+0041) followed by a Combining Diaeresis (U+0308) diacritic.

There also might be any number of Zero-Width Spaces (U+200B), as well as other "invisible" characters in your strings.

Therefore, both strings may render the same, but actually be different.

Frédéric Hamidi
  • 258,201
  • 41
  • 486
  • 479
  • 3
    Exactly, don't compare strings by printing them. – Ivan Kuckir May 28 '13 at 19:34
  • You must be right. In the comments under my initial post, when comparing the lengths of the strings there is a difference. – little-dude May 28 '13 at 19:41
  • Actually `x` is generated and stored by [an openpgp library](https://github.com/openpgpjs), and `y` is taken `chrome.storage.local`. Maybe when I store this string with the chrome API I change the string... @IvanKuckir : what do you suggest ? Perform Unicode normalization like in [this topic](http://stackoverflow.com/questions/10805711/javascript-string-comparison-fails?lq=1) – little-dude May 28 '13 at 19:50
  • 1
    @emma, I imagine the strings stored in local storage come from that openpgp library to begin with. If that's the case, can you perform a simple round-trip check to verify Chrome does not normalize the strings when persisting them to local storage? – Frédéric Hamidi May 28 '13 at 20:09
  • @FrédéricHamidi, how can I do that ? Comparing `escape(strFromPgplib)` and `escape(strFromChromeStorage)` is ok ? – little-dude May 28 '13 at 20:32
  • 1
    @emma, `escape()` will probably do the job since it doesn't know about UNICODE combiners and surrogates. It may or may not be okay depending on your curiosity and exact requirements. By a round-trip test, I was thinking about something like: fetch problematic string from openpgp library, persist string to local storage, read string from local storage, compare strings. If they are different, then Chrome local storage is tampering with the values somewhere along the way, and I don't think it is supposed to do that. – Frédéric Hamidi May 28 '13 at 20:46
5

Try to escape your two strings to see what chars are in them. In this case (although Frédéric has covered possible cases) since you're using PGP, you probably have a binary non-printable char present.

escape(x);
escape(y);

in your console and you will be able to detect the char in action.

  • Indeed `escape()` returns `"%C4A%u0308"` for `"ÄÄ"` (first character single, second composite). This provides an easy way to compare without resorting to normalization. – Frédéric Hamidi May 28 '13 at 20:17
  • I don't clearly understand what escape does. escape(x) gives "/%28%3F%3A%5E%7C%3A%7C%2C%29%28%3F%3A%5Cs*%5C%5B%29+/g" and escape(y) gives "/%5C%5C%28%3F%3A%5B%22%5C%5C%5C/bfnrt%5D%7Cu%5B%5Cda-fA-F%5D%7B4%7D%29/g", if there is no mistake... it is the first time I use breakpoints, and all that stuff. – little-dude May 28 '13 at 20:27
  • @emmasculateur escape "encodes" non-printable characters, but also spaces. Its main purpose is to make the string transferable over net as a string. But it can be used to detect chars you may not see if just printed to console etc. –  May 28 '13 at 20:32
  • um, I meant *content* transferred as string.. sleep deprivation kicks in :-). You don't need to escape to compare as you have seen just comparing x === y returns false, but it's a nice tool to find out *which* char exist in the string that cannot be printed/seen. –  May 28 '13 at 20:46
1

BTW. try this code in JS (copy-paste) :)

console.log("A" == "А");

prints "false" :)

Comparing strings means comparing character codes. In some fonts, different character codes have the same "picture", like "l" and "I" (first is L, second is i). In my example above, first A is cyrillic, second is latin.

Ivan Kuckir
  • 2,327
  • 3
  • 27
  • 46
0

If you are trying to do it in c# this might have to do something with Normalization. FormC vs FormD vs FormKC vs FormKD Reference : http://sharepoint.asia/two-exactly-same-strings-fail-while-comparison-in-c-net/

  • 1
    This link may answer the question, but it is recommended to copy main contents of the link to the question for further reference, as the link may become inactive. – Tomas Pastircak May 01 '14 at 18:25