2

Summary

My problem is that I want to get rid of near duplicates in my Core Data based iOS project that uses Ensembles to sync with iCloud.

  • The sync with iCloud works basically well in my app.
  • The problem is when a user creates similar objects on multiple devices before his persistent store is leeched by Ensembles (connected to iCloud).
  • This generates near duplicates which is factually correct.
  • My approach to remove these duplicates doesn't seem to work.

Detailed problem

A user can create NSManagedObjects on different devices before he is connected to iCloud. Lets say he has a NSManagedObject named Car which has a "To One" relationship to a NSManagedObject named Person which in return has a "To Many" relationship to Car. This would look like this: A simplified model

Ok, lets imagine the user has two devices and he creates two NSManagedObjects on each device. A Car named "Audi" and a Person named "Raphael". Both connected through a relationship. On the other device he creates a Car named "BMW" and another Person named "Raphael". Also connected to each other. Now the user has two similar objects on each device: Two Person objects both named "Raphael."

My Problem is that the user would end up having two Person objects with the name "Raphael" on each device after he synced.

This is actually correct since the objects get their uniqueIdentifiers (to identify objects in Ensembles) when the user leeches his persistent store. The objects are factually different. But this what I want to fix.

My approach

I implemented this delegate method and removed the duplicates in the reparationContext.

- (BOOL)persistentStoreEnsemble:(CDEPersistentStoreEnsemble *)ensemble 
    shouldSaveMergedChangesInManagedObjectContext:(NSManagedObjectContext*)savingContext
    reparationManagedObjectContext(NSManagedObjectContext *)reparationContext {

    [reparationContext performBlockAndWait:^{

        // Find duplicates
        // Change relationships and only use the inserted Person object (the one from iCloud)
        // Delete local Person object
        [reparationContext save:nil];
    }
    return YES;
}

Basically this seems to work well on the second device that merges the data from the first device. But unfortunately it seems that the local person is still synced to iCloud even if it was deleted in the reparationContext.

This leads to a broken state since the first device then also merges the changes from the second device and replaces the person again which was already deleted on the second device. Some syncs later the person is finally missing in the car relationship and the app throws syncing errors.

Steps to reproduce the problem

  • Step 1 (Device 1)

    • Create objects
    • Data: Car "Audi" -> Person "Raphael (Device 1)"
  • Step 2 (Device 2)

    • Create objects
    • Data: Car "BMW" -> Person "Raphael (Device 2)"
  • Step 3 (Device 1)

    • Leech data from store
    • Connect to iCloud
    • Send data to iCloud
    • Data: Car "Audi" -> Person "Raphael (Device 1)"
  • Step 4 (Device 2)

    • Leech data from store
    • Connect to iCloud
    • Merge data from iCloud
    • Replace local person from Device 2 with inserted person from Device 1
    • Delete local person from Device 2
    • Send data to iCloud
    • Data:
      Car "Audi" -> Person "Raphael (Device 1)"
      Car "BMW" -> Person "Raphael (Device 1)"
  • Step 5 (Device 1)

    • Merge data from iCloud
    • Replace local person from Device 1 with inserted person from Device 2 (this shouldn’t happen)
    • Delete local person from Device 1 (this shouldn’t happen)
    • Send data to iCloud
    • Expected data:
      Car "Audi" -> Person "Raphael (Device 1)"
      Car "BMW" -> Person "Raphael (Device 1)"
    • Actual data:
      Car "Audi" -> Person "Raphael (Device 2)"
      Car "BMW" -> Person "Raphael (Device 2)"

Actually the local person object "Raphael (Device 2)" was deleted in Step 4, but it seems that it was still sent to iCloud because in Step 5 it pops up as an insert in savingContext.insertedObjects from the shouldSaveMergedChangesInManagedObjectContext delegate method.

As far as I understood, Ensembles first pulls changeds from iCloud, asks the user if everything is as expected via the delegate methods, then merges into the persistent store and sends deltas to iCloud after the merge.

Am I doing something wrong? Or is this an Ensembles bug?

Raphael
  • 3,846
  • 1
  • 28
  • 28

2 Answers2

2

There is the issue that lars mentioned. You do have to be careful to always do things deterministically. Sorting on unique id is one way to do that.

Personally, I would handle this one of two other ways:

  1. Do the dedupe after a merge completes (again, making sure it is deterministic)
  2. Using carefully chosen global identifiers to control dedupe for you.

For example, you could use the unique id Raphael. The only thing you then need to be careful of is that when you go to create another Raphael on the same machine, it is called Raphael_1 (or whatever).

If your unique id is very likely to be unique (e.g. first + last name is unlikely to clash), Ensembles will automatically merge the person on different devices.

Drew McCormack
  • 3,490
  • 1
  • 19
  • 23
  • Thanks! Using the name as unique id would be a good approach to let the merge happen automatically. But the user can edit his name, which could theoretically lead to near duplicates again.. but I guess very unlikely in the real world. **I still have 2 questions:** **1)** Why does Ensembles upload the object I deleted in the reparationContext? Is this a bug or is it the desired behavior? **2)** Why would you dedupe after a merge and not in the delegate method? Isn't this the dedicated place to do such things? – Raphael Jan 25 '16 at 18:39
  • There are some complications with using the delegate method. For many things, it is good. The problem is that change sets are not ordered during the merge, so if you inserted the object, then deleted it, there is no concept in that merge of which was first. Doing it after the merge guarantees the deletion is ordered afterwards. So, bug, not really, but complication, certainly. Deduping in the didSave... delegate method should be fine, or in the completion block. – Drew McCormack Jan 26 '16 at 13:19
  • In terms of using names as unique ids, you should always ensure it is locally unique. Then the only risk is they enter 'John Smith' on two devices, at about the same time. But in that case, it is very likely they are referring to the same person, so they probably should be treated as the same object. – Drew McCormack Jan 26 '16 at 13:20
1

What I think is wrong with your "reparationContext" handler, is that you delete the local and keep the remote object. The other device will do the same but it's vice versa on this side and delete the wrong object then. The reparation method has to be deterministic. So maybe you can sort the two Persons by the uniqueID or something and delete always the first. Then all devices would do the same and there should be no ping-pong sync bringing deleted data back.

lars
  • 239
  • 2
  • 9
  • Thats a good point! If I would have a timestamp I could always take the oldest one. But taking a unique ID doesn't guarantee the method to be deterministic, right? If a third device joins in and has a unique ID that would be sorted to first place it wouldn't be deterministic anymore. What I find is really strange is that the repair method uploads deleted objects. Maybe my implementation is just wrong.. – Raphael Jan 25 '16 at 09:57