4

I have a list of about 50 tags in an array, and want to search through my documents to find records that match these tags.

Because they're user-submitted and mongoDB is case-sensitive, I'm using /wildcard/i as a means of searching. I know this is not the fastest way to do a search but I can't think of a better solution.

I can do my query in two ways. The first is to run a for loop over my tags array, and for each result, perform:

db.collection.find({tags: /<tag[x]>/i})

Or, I can collect all of the tags and run one single lookup using $or, like so:

db.collection.find({$or:[{tags:/<tag1>/i},{tags:/<tag2>/i},{tags:/<tag3>/i}, ... {tags:/<tag50>/i}]});

I have tried both, and found using $or to be significantly faster - but because of the work-in-progress state of my application, it's very difficult to tell whether this is because it's actually faster or whether my app is causing significant overhead in other areas (it is).

So for clarification, in MongoDB is a big query performed once faster than small queries performed many times?

EDIT: Another example would be whether looking up 3 individual records based on _id is faster than doing one lookup using {$or:[{_id: ObjectId([id1])},{_id: ObjectId([id2])},{_id: ObjectId([id3])}]}. Is less more?

Community
  • 1
  • 1
JVG
  • 20,198
  • 47
  • 132
  • 210
  • I edited your question a bit, because I think you'll get a better, more useful answer by asking a more specific question. – paulmelnikow Jul 30 '13 at 23:01
  • 1
    Thanks, but the edit defeats the point of the question, the tags are just an example of implementation, as you can see in the edit I'm much more concerned with efficacy of `$or` vs multiple `find()` lookups. – JVG Jul 30 '13 at 23:14
  • Would you revert my change to the title? Maybe you see my point. – paulmelnikow Jul 30 '13 at 23:28

1 Answers1

1

I recommend you adjust your schema so it keeps a normalized array of tags. When you insert a new document, do it like this:

tags : [ "business", "Computing", "PayPal" ],
lowercaseTags : [ "business", "computing", "paypal" ]

Similarly when you update the tags, update both arrays.

Create an index on lowercaseTags, and then when you want to query them, use a single query with the $in operator, and the normalized form of the search terms.

For example, to search for business iTunes YouTube, use this query:

db.collection.find( { tags : $in: [ "business", "itunes", "youtube" ] } )

This answer gives an example of this approach. It should be loads faster than what you have.

An alternate approach you can take is to create a text index and use the text command.

Both of these approaches are geared toward index optimization, and designing your schema to work well with Mongo. The payoff should be a lot higher than whatever difference there is between a single $or query and 50 simpler queries.

Community
  • 1
  • 1
paulmelnikow
  • 16,895
  • 8
  • 63
  • 114
  • Despite editing the question back to original your response still answers it :) cheers! – JVG Jul 30 '13 at 23:18
  • Just to clarify, you're talking about looping through the array of tags then doing an `$in` lookup on each rather than an `$or` lookup on all tags, even if the latter uses your lowercase, properly-indexed suggestion? – JVG Jul 30 '13 at 23:19