5

So here is a problem.

I have a string

    Белый Клык-0.fb2

NSString method length return 16

After save string in Core Data (backend - sqlite)

NSString method length return 17, but visually string stay the same

    Белый Клык-0.fb2

And obviously method isEqualToString: return NO

After spent a lot of time in experiments, i am fugure out that problem is this letter:

    й

Removing this letter solve problem.

But it is keeping driving me crazy, why something like that is happening?

Here workaround that works, but dont satisfy me:

  1. stringByReplacingPercentEscapesUsingEncoding: - need to convert string right in and after db query
  2. transliterate whole string - kinda hack

And here workaround that dosnt works:

  1. stringWithUTF8String
  2. Converting escaped UTF8 characters back to their original form

Please help me understand what is going on with string after save in Core Data.

And there is more elegant solution that i did?

Community
  • 1
  • 1
Wert1go
  • 297
  • 5
  • 10
  • This might be a [unicode normalization](http://unicode.org/reports/tr15/) related issue. Just try to compare your coredata string to `[yourOriginalString decomposedStringWithCanonicalMapping]` and see if that works... (I've tested it and it returns a length of 17 when called in the string on your example) – Alladinian Feb 25 '13 at 08:57
  • Thanks! That really working, sadly but i don't even heard about canonical mapping. Can you add an answer? I mark it how right answer. – Wert1go Feb 25 '13 at 09:17
  • If you really need to preserve the original string including the composed characters you have to store it as NSData: `[myString dataUsingEncoding:NSUTF16StringEncoding]` – Nikolai Ruhe Feb 25 '13 at 09:39

1 Answers1

3

The issue might be related to unicode normalization. So Coredata seems to store the string decomposed (so й counts for 2 - one for the letter and one for the accent) and this is why you get the difference in length. If you try to decompose your original string before comparing it to what Coredata returns, it should work:

[yourOriginalString decomposedStringWithCanonicalMapping]

Now, the reason behind this is beyond my field of expertise. I constantly use coredata for managing my models and have worked multiple times with Greek / Russian strings and never had such an issue. If anyone can expand on this and shed some light I would be also very interested in the subject.

Alladinian
  • 34,483
  • 6
  • 89
  • 91
  • It's common for databases to do these kind of canonical decompositions to enhance sorting and searching performance. When the database knows that the internal representation is always decomposed a test for equality is just a bitwise compare. – Nikolai Ruhe Feb 25 '13 at 09:35
  • @NikolaiRuhe That makes sense, but shouldn't `NSString`'s comparing methods handle this automatically ? – Alladinian Feb 25 '13 at 09:40
  • They do, but the performance gains come from where there are no NSStrings yet: The database's internal sorting, indexing and search all need fast comparison of strings. The enhanced performance is a result of knowing that you don't need the more complex composed-charcters-aware algorithm. – Nikolai Ruhe Feb 25 '13 at 09:44
  • @NikolaiRuhe Yes, indeed that would happen in a case of a db query (although, generally speaking, coredata store could be an XML file as well), but the issue here is that the object that is fetched from the store (an `NSString` in this example) does not compare as expected with another `NSString`. Thanks for the insight by the way! – Alladinian Feb 25 '13 at 09:52
  • Ah, now I understand. You meant NSString's `isEqual:` should handle character compositions transparently? Well, that is not the case. Character composition/decomposition is a common source of bugs or at least confusion that nobody but the application engineer can solve. At least Cocoa's `isEqual:` implementation is along the lines of most unicode frameworks. If you need automatic decomposition you can use `compare:`. – Nikolai Ruhe Feb 25 '13 at 12:29
  • @Alladinian For few paths , I am getting precomposed string. Path contain Ü. Any clue ? – Parag Bafna Apr 17 '18 at 12:47