The best way to confirm index usage for a given query is using MongoDB's query explain()
feature. See Explain Results in the manual for your version of MongoDB for more information on the output fields and interpretation.
With regular expressions a main concern is efficient use of indexes. An unanchored substring match like /Q10/
will require examining all index keys (assuming a candidate index exists, as in your example). This is an improvement over scanning the full collection data (as would be the case without an index), but not as ideal as being able to check a subset of relevant index keys as is possible with a regex prefix search.
If you are routinely searching for substring matches and there is a common pattern to your strings, you could design a more scalable schema. For example, you could save whatever your Q10
value represents into a separate field (such as part_number
) where you could use a prefix match or an exact match (non-regex).
To illustrate, I set up some test data using MongoDB 3.4.2 and the mongo
shell:
// Needles: strings to search for
db.mod.insert([{number:'Q10'}, {number: 'foo-Q10'}, {number:'Q10-123'}])
// Haystack: some string values to illustrate key comparisons
for (i=0; i<1000; i++) { db.mod.insert({number: "I" + i}) }
Regex search without an index:
db.mod.find({ number: { $regex: /Q10/ }}).explain('executionStats')
The winningPlan
is a COLLSCAN
(collection scan) which requires the server retrieve every document in the collection to perform the comparison. Note that the original regex includes an unnecessary .*
prefix and suffix; this is implicit with a substring match so can be written more concisely as /Q10/
.
Highlights from the executionStats
section of the explain output:
"nReturned": 2,
"totalKeysExamined": 0,
"totalDocsExamined": 1003,
The explain output confirms there are no index keys examined and 1003 documents (all docs in this collection).
Add an index for the following two examples:
db.mod.createIndex({number:1}, {unique: true})
Regex substring search with an index:
db.mod.find({ number: { $regex: /Q10/}}).explain('executionStats')
The winningPlan
is still an IXSCAN
, but now has to examine all 1003 indexed string values to find substring matches:
"nReturned": 3,
"totalKeysExamined": 1003,
"totalDocsExamined": 3,
Regex prefix search with an index:
db.mod.find({ number: { $regex: /^Q10/}}).explain('executionStats')
The winningPlan
is an IXSCAN
(Index scan) which requires 3 key comparisons and 2 document fetches to return the 2 matching documents:
"nReturned": 2,
"totalKeysExamined": 3,
"totalDocsExamined": 2,
A prefix search isn't equivalent to the first two searches, as it will not match the document with value foo-Q10
. However, this does illustrate a more efficient regex search.
Note that totalKeysExamined
is 3. It might be reasonable to expect this to be 2 since there were only 2 matches, however this metric includes any comparisons with out-of-range keys (eg. end of a range of values). For more information see Explain Results: keysExamined
.