3

I have read putting indexes on low cardinality fields is pointless. Would this hold true for a compound index as such:

db.perms.createIndex({"owner": 1, "object_type": 1, "target": 1});

With queries as such:

db.perms.find({"owner": "me", "object_type": "square"});
db.perms.find({"owner": "me", "object_type": "circle", "target": "you"});

The amount of distinct object_type's would grow over time (probably no more than 10 or 20 max) but would only start out with about 2 or 3.

Similarly would a hash index be worth looking into?

UPDATE:

owner and target would grow immensely. Think of this like a file system wherein the owner would "own" a target (i.e. file). But, like unix systems, a file could be a folder, a symlink, or a regular file (hence the type). So although there are only 3 object_type's, a owner and target combination could have thousands of entries with an even distribution of types.

Community
  • 1
  • 1
Realistic
  • 1,038
  • 1
  • 10
  • 20
  • 1
    I just posted an answer containing some material on `cardinality` that may help you. [Index Cardinality](http://stackoverflow.com/questions/33545339/how-does-the-order-of-compound-indexes-matter-in-mongodb-performance-wise/33546159#33546159) – Abdullah Rasheed Nov 05 '15 at 16:47
  • Also, you can't create compound indexes that have hashed index fields. https://docs.mongodb.org/manual/core/index-compound/#index-type-compound – Abdullah Rasheed Nov 05 '15 at 17:08
  • It is not pointless it's just...well, be careful (same as any DB). Do not put them as the top level of a compound tree (index). As for your index: how many `target` values do you have? – Sammaye Nov 05 '15 at 17:14
  • Also what is the size of the document overall? It might be better to leave a field out and speed up queries if the document is small enough overall – Sammaye Nov 05 '15 at 17:18
  • Sorry for repeated comments but that first link is way out of date. Seems like it was written towards MongoDB 1.8 maybe – Sammaye Nov 05 '15 at 17:23
  • @Sammaye Added more info. Any given combination of all three values could return thousands of rows. Any combination of the two could return a factor greater (lets call it 10 thousand) . – Realistic Nov 05 '15 at 22:18
  • Hmm, it is hard to say if you should remove `object_type`. I would say that right here the answer is uncertain, it would need to be tested on your end. The thing that questions me is what it would be like to scale thousands of rows directly in the document. Since the document would not probably be a lot bigger than the index it should not be much overhead and you are returning full documents so you are loading the document anyway. Switching object_type to the back would be useless for queries not containing a target so that would be a pointless optimisation. – Sammaye Nov 05 '15 at 22:49
  • So I will answer that, immediately from this view, I cannot tell you which would be more performant. – Sammaye Nov 05 '15 at 22:50
  • Actually I will go for keeping the low cardinality field in there since even though scaling the index will be slower you will load less documents into your working set to understand the result set that will be returned – Sammaye Nov 05 '15 at 22:52

1 Answers1

1

I may not be able to answer your question, but giving my cents for index cardinality:

Index cardinality: it refers to the number of index points for each different type of index that MongoDB supports.

  1. Regular - for every single key that we put in the index, there's certainly going to be an index point. And in addition, if there is no key, then there's going to be an index point under the null entry. We get 1:1 relative to the number of documents in the collection in terms of index cardinality. That makes the index a certain size. It's proportional to the collection size in terms of it's end pointers to documents
  2. Sparse - when a document is missing a key being indexed, it's not in the index because it's a null and we don't keep nulls in the index for a sparse index. We're going to have index points that could be potentially less than or equal to the number of documents.
  3. Multikey - this is an index on array values. There'll be multiple index points (for each element of the array) for each document. So, it'll be greater than the number of documents.

number of index points

Let's say you update a document with a key called tags and that update causes the document to need to get moved on disk. Assume you are using the MMAPv1 storage engine. If the document has 100 tags in it, and if the tags array is indexed with a multikey index, 100 index points need to be updated in the index to accommodate the move?

Zameer Ansari
  • 28,977
  • 24
  • 140
  • 219