0

I am designing a system that will use CouchDB and TouchDB\CloudantSync to cache the database on the users' smartphone using replication. Now Let's say I have 1000s of documents, each has a 100Kb attachment, and I want to free some space on the smartphone by removing a document.

I want that after I remove some document, I could replicate it again from the server. This is different from deletion, which will give the document a new revision and avoid replicating it from the server again (because the deleted document on the smartphone is a child of the undeleted one).

I could obviously make redundant updates on the server documents, but that's inefficient...

Is there a way to "unsync" the document?

Oren
  • 2,767
  • 3
  • 25
  • 37

3 Answers3

2

You could create a database for each user locally which the user's smartphone is set up to replicate from continuously. Then, to push a document to the user, replicate it from the master db to the local user db to be picked up by the replication. To delete it remotely, delete it only in the local user db, propagating the delete. To re-copy it to the device, you have to manually overwrite the deleted document with its original from the master db.

skiqh
  • 465
  • 2
  • 12
  • The "re-copy" won't work. The deleted document in the user's db is a child of the original document, so the replication from the master db to the user db will not re-copy the original document. – Oren Jun 11 '14 at 13:20
  • @Oren You are absolutely right. Edited my answer to reflect this. – skiqh Jun 12 '14 at 05:26
0
  1. Create a continuous\polling replication from the server to the smartphone - filtered to prevent too much space usage on the smartphone.
  2. Whenever you want to free some space, remove some document's id from the last filtered replication, delete it from the smartphone (use compaction for a real cleanup), and keep its id in some unsynced_documents list.
  3. Whenever you want to resync a document, read it from the server and create it on the smartphone as a whole new document (ignore the revision). You can add a field resynced: true to the document's json. Don't forget to update the replication filter and the unsycned_documents list.
  4. When a "resynced" document changes on the server, it will be replicated to the smartphone, which already has a document with the same id (created on the smartphone). This will create a conflict. Resolve the conflict by choosing the server's revision (by deleting the revision with resynced: true).

I'm talking about the case where a one-way (Server -> User) replication is required. i.e., the users only have read-permissions. If you give the users write permission, you should find a way to differ between an intentional deletion of a document and "unsyncing".

Oren
  • 2,767
  • 3
  • 25
  • 37
-1

Check out this bug on TouchDB. It sounds like the purge function is what you need.

However, this may affect re-replicating as noted in the bug. I'm not sure whether TouchDB supports named-document replication, I think this is how you'd workaround the standard replication behaviour.

Unfortunately, we've not exposed purging on Cloudant Sync yet (it's on the roadmap).

Mike Rhodes
  • 1,816
  • 12
  • 15
  • From: http://wiki.apache.org/couchdb/Purge_Documents "If you are using _purge to recover space, you are almost certainly using CouchDB inappropriately. The most common reason developers use _purge inappropriately is when managing short-lived data (log entries, message queues, etc). A better remedy is to periodically switch to a new database and delete the old one (once the entries in it have all expired)." – Oren Jun 10 '14 at 09:06
  • It seems they also have problems with re-replicating purged documents, because the replicator looks only at new revisions created on the source database since the last successful replication... – Oren Jun 10 '14 at 09:17
  • CouchDB dev here. Purge is not meant for regular operations. It is a last resort, in case you committed your SSN or credit card number to CouchDB and need to get it out. – Jan Lehnardt Jun 11 '14 at 07:27
  • While I freely admit it's not intended use, combined with named document replication I think it may fit the use-case. However, if it'd be grossly problematic, I'd like to understand why. – Mike Rhodes Jun 23 '14 at 13:40