1

I am generating analytics stored in mongo. These analytics are stored in documents that are uploaded when the statistic needs to be updated.

Lets say my statistic document has two fields: fieldA and fieldB identifying what I am counting and a field sum that holds my statistic. Now I want one document for every combination of fieldA and fieldB values, and the field sum is updated every time I have something new to count.

The logical choice is upsert, but it is not atomic, and thus it might generate duplication of documents with the same fieldA and fieldB values. So I need to find a way to add some unique index on the combination of fieldA and fieldB. But I would like to avoid using a compound index which seems to be a very expensive way to do this.

Currently I am creating a hash(fieldA + fieldB) (md5) and storing it as the id, but it seems to me that it is more a hack than a true solution. Meanwhile it is the cheapest solution I could think of, as it uses the index of the id, which already exists. Does this solution seems appropriate? will it generate some malfunction of Mongo later? should'nt this be integrated to Mongo clients like Spring Data?

Remi D
  • 501
  • 4
  • 17
  • 1
    What's wrong with using a unique compound index? Sure it uses more space than a hack, but it's the right solution. Either that, or make your `_id` be `{fieldA: value, fieldB: value}` if you're not using the `_id` for another purpose. Also, hashes aren't unique, so your current approach won't always work. – JohnnyHK Jan 15 '16 at 14:43
  • using `{fieldA: value, fieldB: value}` as id seems pretty nice, but also a bit dangerous as the field order would be crucial, wouldn't it? As for the collision risk of hashes, I'm not to worried about that as the probability is neglictable (cf http://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions) – Remi D Jan 18 '16 at 04:48
  • Yep, field order is significant if you use `_id` for this. – JohnnyHK Jan 18 '16 at 05:22

0 Answers0