0

I have a collection called Post. I have a mapping system that always ensures that each document has these fields:

  • id (int)
  • target (string)
  • type (string)
  • user_id
  • client_id
  • updated (string, 11 int timestamp)
  • created (string, 11 int timestamp)
  • enabled (bool)

This collection is accessed to be output in an API schema.

So some typical requests might be:

/post?type=image&user_id=2
/post?updated=35234423&order_by=client_id
/post?enabled=true&order_by=id

There is no 100% guarantee certain fields make it to the find or sort field.

Recently when the table reached 8GB of data, I started getting this error:

"localhost:27017: too much data for sort() with no index. add an index or specify a smaller limit"

I have looked at the documentation for Mongo index and found it difficult to understand whether it works in the same way as a MySQL index.

Some threads I found on indexing: MongoDB - too much data for sort() with no index error seem to suggest using specific sort fields to ensure the index is hit. Obviously I cannot do that when alot of my filtering and sorting is optional.

Could anyone suggest what a firm solution would be in terms of whether I should index all fields on my table?


Thanks for the feedback guys, I've started building an auto index function:

public function get() {

        $indices['Post'] = array(
            'fields' =>
                array(
                    'id'                => array('unique' => true, 'dropDups' => true, 'background' => true),
                    'client_id'         => array('dropDups' => true, 'background' => true),
                    'image_id'          => array('dropDups' => true, 'background' => true),
                    'user_id'           => array('dropDups' => true, 'background' => true),
                    'publish_target'    => array('dropDups' => true, 'background' => true),
                    'type'              => array('dropDups' => true, 'background' => true),
                    'status'            => array('dropDups' => true, 'background' => true),
                    'text'              => array('background' => true)
                )
        );

        foreach ($indices as $key => $index) {

            /* set the collection */
            $collection = $this->mongoDB->{$key};

            /* delete the indexes */
            $collection->deleteIndexes();

            /* loop the fields and add the index */
            foreach ($index['fields'] as $subKey => $data) {
                $collection->ensureIndex($subKey, array_merge($data, array('name' => $subKey)));
            }
        }
        /* return the list */
        return $indices;
    }
Community
  • 1
  • 1
azz0r
  • 3,283
  • 7
  • 42
  • 85
  • can you share your queries? would be helpful to analyse? – user10 Jul 25 '13 at 10:11
  • @user10 he shares exmaples of his quireies if you look about half way down; this is where https://jira.mongodb.org/browse/SERVER-3071 will really help – Sammaye Jul 25 '13 at 10:54

2 Answers2

2

You should know upfront what sort of queries are going to hit the server. Without that you can't do any optimisation and might run into sort issues like you have now.

If you say that the users get to sort by any of the 9 fields that you have, you will need to create an index on each of them. However you need to remember that sometimes it makes more sense to create a compound index, as preventing the issue for:

/post?updated=35234423&order_by=client_id

Can only done by setting an index on:

{ updated: 1, client_id: 1 }

Indexes in MongoDB can only be used if all left-hand side fields in the index are part of your query.

So: { updated: 1, client_id: 1 } works optimal for:

  • find( { 'updated' : 1 } );
  • find( { 'updated' : 1, 'client_id' : 1 } );
  • find( { 'updated' : 1 } ).sort( { 'client_id' : 1 } );

But not for:

  • find( { 'client_id' : 1 } );
  • find( { 'client_id' : 1 } ).sort( { 'updated' : 1 } );

In order to reduce the amount of data and to prevent your error message you can also additionally add a limit() to each query. With 8MB of results, I doubt your UI can show that many results anyway, so using limit() might make sense.

Derick
  • 35,169
  • 5
  • 76
  • 99
  • Thank you. So I could build maybe 25 indices based on common requests, but would that be slower? – azz0r Jul 25 '13 at 12:29
  • More indexes will take more time when you insert/update/delete etc. You might want to reconsider your schema design. – Derick Jul 25 '13 at 12:33
1

Unfortunately I cannot think of a really good solution to such a dynamic nature with indexes however, this JIRA https://jira.mongodb.org/browse/SERVER-3071 would really help you.

I suggest you watch that JIRA ticket.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Thanks for sharing, I'll keep an eye on it. – azz0r Jul 25 '13 at 12:30
  • index intersection doesn't help here, as you'd still need an index per field at least. – Derick Jul 25 '13 at 12:34
  • @Derick Indeed but it helps more than atm, since mongodb can only use one index per query, with intersectioning you could put an index on each field and get index usage, much like most SQL techs – Sammaye Jul 25 '13 at 12:35