0

I have a model with a normal index using Mongoose.

const mod = new mongoose.Schema({
  number: { type: String, required: true, index: { unique: true } },
});

I am using a regex in a query to get the mod corresponding to a specific number. Will my regex query utilize the index that is on this model?

query.number = {
      $regex: `.*Q10.*`
}
modelName.find(query)

I am concerned that this is looking through the entire collection without using the indexes. What would be the best way to know if I am using the index. Or if you happen to know a way that will utilize the index could you show me? Here I am looking for all close to Q10, not trying to get an exact match. Would using /^Q10.* be better and use the index?

Referencing MongoDB regex information on index and comments made on this post stackoverflow previous question

Community
  • 1
  • 1
mjwrazor
  • 1,866
  • 2
  • 26
  • 42

2 Answers2

3

The best way to confirm index usage for a given query is using MongoDB's query explain() feature. See Explain Results in the manual for your version of MongoDB for more information on the output fields and interpretation.

With regular expressions a main concern is efficient use of indexes. An unanchored substring match like /Q10/ will require examining all index keys (assuming a candidate index exists, as in your example). This is an improvement over scanning the full collection data (as would be the case without an index), but not as ideal as being able to check a subset of relevant index keys as is possible with a regex prefix search.

If you are routinely searching for substring matches and there is a common pattern to your strings, you could design a more scalable schema. For example, you could save whatever your Q10 value represents into a separate field (such as part_number) where you could use a prefix match or an exact match (non-regex).

To illustrate, I set up some test data using MongoDB 3.4.2 and the mongo shell:

// Needles: strings to search for
db.mod.insert([{number:'Q10'}, {number: 'foo-Q10'}, {number:'Q10-123'}])

// Haystack: some string values to illustrate key comparisons
for (i=0; i<1000; i++) { db.mod.insert({number: "I" + i}) }

Regex search without an index:

db.mod.find({ number: { $regex: /Q10/ }}).explain('executionStats')

The winningPlan is a COLLSCAN (collection scan) which requires the server retrieve every document in the collection to perform the comparison. Note that the original regex includes an unnecessary .* prefix and suffix; this is implicit with a substring match so can be written more concisely as /Q10/.

Highlights from the executionStats section of the explain output:

 "nReturned": 2,
 "totalKeysExamined": 0,
 "totalDocsExamined": 1003,

The explain output confirms there are no index keys examined and 1003 documents (all docs in this collection).

Add an index for the following two examples:

db.mod.createIndex({number:1}, {unique: true})

Regex substring search with an index:

 db.mod.find({ number: { $regex: /Q10/}}).explain('executionStats')

The winningPlan is still an IXSCAN, but now has to examine all 1003 indexed string values to find substring matches:

 "nReturned": 3,
 "totalKeysExamined": 1003,
 "totalDocsExamined": 3,

Regex prefix search with an index:

 db.mod.find({ number: { $regex: /^Q10/}}).explain('executionStats')

The winningPlan is an IXSCAN (Index scan) which requires 3 key comparisons and 2 document fetches to return the 2 matching documents:

 "nReturned": 2,
 "totalKeysExamined": 3,
 "totalDocsExamined": 2,

A prefix search isn't equivalent to the first two searches, as it will not match the document with value foo-Q10. However, this does illustrate a more efficient regex search.

Note that totalKeysExamined is 3. It might be reasonable to expect this to be 2 since there were only 2 matches, however this metric includes any comparisons with out-of-range keys (eg. end of a range of values). For more information see Explain Results: keysExamined.

Stennie
  • 63,885
  • 14
  • 149
  • 175
  • Great advice. Although the .* is only implied on mongodb I guess. On Nodejs I had to have it as `$regex: `^${ myValue }.*`. This ended up giving me the results I wanted and limiting the amount scanned. Thanks – mjwrazor Mar 22 '17 at 14:38
  • I did run into a problem though. Say I have Q1200 = Q1209. When I type in Q12 it shows nothing. But as soon as I hit 0 to be Q120 I see all 0-9 results. But if I type lets say 7 in to be Q1207 and then hit backspace to be Q120 I only see one result of Q1207. Using a angularjs typeahead to see the results. Also there is a 50 item limit. – mjwrazor Mar 22 '17 at 15:25
  • 1
    Fixed. i just needed to add a delay on the uib-typeahead so it would see the results. `typeahead-wait-ms="100"` – mjwrazor Mar 22 '17 at 16:18
1

With the index enabled, For case sensitive regular expression queries, the query traverses the entire index (load into memory), then load the matching documents to be returned into memory. Its expensive but still could be better than a full collection scan.

For /John Doe/ regex ,mongo will scan the entire keyset in the index then fetch the matched documents.

However, if you use a prefix query :

Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

Rahul
  • 15,979
  • 4
  • 42
  • 63
  • Ah I needed this info for sure. `"prefix expression", which means that all potential matches start with the same string` – mjwrazor Mar 22 '17 at 13:27