2

I have a very strange situation with a couple of meteor applications that I am running on a semi-production machine.

Basically, I have a few documents (I'm actually not sure how many at the moment) that have duplicate fields:

{
        "_id" : ObjectId("5006040239bcf91fab6311e5"),
        "first_name" : "First Name",
        "landline" : "555 555-5555",
        "last_name" : "Last Name",
        "prior_email" : "newrandomemail@example.net",
        "prior_email" : "newrandomemail@example.net",
}

The mongo docs are a little vague about the validity of this state:

BSON documents may have more than one field with the same name. Most MongoDB interfaces, however, represent MongoDB with a structure (e.g. a hash table) that does not support duplicate field names. If you need to manipulate documents that have more than one field with the same name, see the driver documentation for your driver.

Some documents created by internal MongoDB processes may have duplicate fields, but no MongoDB process will ever add duplicate fields to an existing user document. http://docs.mongodb.org/manual/core/document/#field-names

I guess there's some argument as to whether or not JSON should ever have duplicate keys: Does JSON syntax allow duplicate keys in an object? but I can't imagine that a javascript (NodeJS, Meteor, etc) driver would ever intentionally do that.

It's a little complicated, though, because we have two Meteor applications sharing a single database. They're basically the front and admin ends of our software. In order to run both of the applications, I start one first with:

meteor -p 3000

Then I start the second with:

export MONGO_URL="mongodb://localhost:3001/meteor"
meteor -p 3002

The strangest thing is that when using the second application, the Meteor findOne() call for that document shows "oldrandomemail@example.net" for the "prior_email" value - the value that had been set previously and then later changed to "newrandomemail@example.net".

I know that these aren't exactly best practices for production deployment, but I'm wondering if anyone else has seen this or has any idea what might be triggering it...

EDIT: The code that updates the database is pretty basic:

Subscribers.update(this._id, {$set: {'prior_email': 'newrandomemail@example.net'}});
Community
  • 1
  • 1
urban_raccoons
  • 3,499
  • 1
  • 22
  • 33
  • 1
    While technically duplicate keys are allowed in BSON, I would not expect a JSON document or JavaScript object to have duplicate keys. The observed behaviour is usually that the last value listed for a duplicate key ends up being the value in the JSON document (and some folks even (ab)use this as a [hack to add comments to JSON](http://fadefade.com/json-comments.html)). I also don't think your issue is related to using a single MongoDB deployment for multiple application instances. Can you share a code snippet showing how `prior_email` is being set? – Stennie Dec 19 '14 at 00:56
  • 1
    Also, by chance are you editing any data with a third party admin tool (i.e. outside of your Meteor code)? It's more likely for duplicate fields to accidentally be created using a lower-level driver like C or C++; higher level languages tend to use a dictionary or hash representation of the MongoDB document that won't allow for duplicate keys without extra effort ;-). – Stennie Dec 19 '14 at 01:01
  • No I'm not using any third party admin tool, just Meteor and the mongo console... – urban_raccoons Dec 19 '14 at 23:48
  • 1
    So what exactly is the question here? How to fix the documents? How did this happen? – tcurdt Dec 25 '14 at 17:51
  • Honestly, either would be amazing. Really any insight into wth is going on here... – urban_raccoons Dec 27 '14 at 00:43

1 Answers1

-1

If you are certain that the prior_email identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option:

 db.collection.ensureIndex({'prior_email' : 1}, {unique : true, dropDups : true})

This will keep the first unique document for each prior_email value, and drop any subsequent documents that would otherwise cause a duplicate key violation.

Important Note: Any documents missing the prior_email field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a prior_email field.

Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.

references: http://docs.mongodb.org/manual/tutorial/create-a-sparse-index/ http://docs.mongodb.org/manual/tutorial/create-a-unique-index/

Hope this helps.

SUNDARRAJAN K
  • 2,237
  • 2
  • 22
  • 38
  • I don't actually want prior_email to be a unique index. It would be fine for two separate documents to have the same prior_email field, but I don't want one single document to have two prior_email fields. – urban_raccoons Dec 19 '14 at 23:49