0

I am new to non-relational databases. I have spent hours googling any way to get rid of duplicates from my cloudant database comprised of Twitter data.

For example in the following screenshot, there are duplicated in the text field. Is there any way to drop them using Cloudant dashboard or any other method?

enter image description here

Thanks...

M.Qasim
  • 1,827
  • 4
  • 33
  • 58

1 Answers1

2

There are no handy uniqueness constraints in Cloudant as those you'd find in a relational database. The only thing that is unique is the document id. As you're free to supply your own document id, you could make that the (say) md5 hash of the tweet body string. That way you'd get a conflict if you tried to insert a dupe.

Otherwise you'd need to create a view that emits the body (or hash thereof) as a key and have a separate process that checks this view for dupes and removes them as necessary, as outlined in the accepted answer here:

Identifying Duplicates in CouchDB

xpqz
  • 3,617
  • 10
  • 16