0

I am using CouchDB 3.1.1 to perform Mango queries against a database containing a large number of documents. A very common requirement in my application is to perform queries on a very specific and dynamic set of documents. From what I understand at this moment, these are the only choices I have on how to confront my problem:

  1. Make multiple requests to /db/_find each with a distinct "_id"
  2. Make a single call to /db/_find

Of the ways I can accomplish the second choice:

  • Use an "$or" array on all the "_id": value pairs
  • Use an "$or" array on all the values of the "_id" key

The second choice is what I would prefer to use since making multiple POST requests would incur overhead. Unfortunately using "$or" seems to get in the way of the query engine making use of the "_id" index.

Thus, choice #1 returns with a speedy 2 ms per transaction but the results are not sorted (requiring my application to do the sorting). Choice #2, given an array of 2 _ids, regardless of the $or syntax, takes over 3 seconds to render.

What is the most efficient way to use a CouchDB Mango query index against a specific set of documents?

Fast Example: Results using a single _id

{
   "selector": {
      "_id": "184094"
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 1
execution time: 2 ms

Slow Example: Results using $or of key / value pairs

{
   "selector": {
      "$or": [
         {
            "_id": "184094"
         },
         {
            "_id": "157533"
         }
      ]
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,454 ms

Slow Example: Results using $or array of values

{
   "selector": {
      "_id": {
         "$or": [
            "184094",
            "157533"
         ]
      }
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,522 ms

Slow Example: Results using $in (which is illegal but still returns results)

{
   "selector": {
      "_id": {
         "$in": [
            "184094",
            "157533"
         ]
      }
   },
   "fields": [
      "_id"
   ]
}

documents examined: 26,312
results returned: 2
execution time: 2,618 ms

Index: The registered index for _id

{
  "_id": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
  "_rev": "1-f951eb482f9a521752adfdb6718a6a59",
  "language": "query",
  "views": {
    "foo-index": {
      "map": {
        "fields": {
          "_id": "asc"
        },
        "partial_filter_selector": {}
      },
      "reduce": "_count",
      "options": {
        "def": {
          "fields": [
            "_id"
          ]
}}}}}

Explain: An 'explain' summary done to one of the slow queries. Note that the registered index was used.

{
 "dbname": "dnp_person_comment",
 "index": {
  "ddoc": "_design/508b5b51e6085c2f96444b82aced1e5dfec986b2",
  "name": "foo-index",
  "type": "json",
  "partitioned": false,
  "def": {
   "fields": [
    {
     "_id": "asc"
    }
   ]
  }
 },
 "partitioned": false,
 "selector": {
  "$or": [
   {
    "_id": {
     "$eq": "184094"
    }
   },
   {
    "_id": {
     "$eq": "157533"
    }
   }
  ]
 },
 "opts": {
  "use_index": [],
  "bookmark": "nil",
  "limit": 25,
  "skip": 0,
  "sort": {},
  "fields": [
   "_id"
  ],
  "partition": "",
  "r": [
   49
  ],
  "conflicts": false,
  "stale": false,
  "update": true,
  "stable": false,
  "execution_stats": false
 },
 "limit": 25,
 "skip": 0,
 "fields": [
  "_id"
 ],
 "mrargs": {
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
   "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
 }
}
NiGHTS
  • 3
  • 3
  • Show examples of actual queries corresponding to your observations, without which a quality answer is unlikely. A long running Mango query may mean there's no corresponding index, and/or it's performing a full index scan, and/or etc. – RamblinRose Jan 25 '21 at 17:50
  • I added some examples. – NiGHTS Jan 27 '21 at 16:28
  • Please see [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). Pasting images is not OK for multiple reasons, a good one being it is useless for those employing a screen reader. – RamblinRose Jan 27 '21 at 21:42
  • It appears there is somewhat of a recreation of the `_all_docs` view as a secondary index! If all you require are queries for `_id` see [1.3.2. /{db}/_all_docs](https://docs.couchdb.org/en/stable/api/database/bulk-api.html#db-all-docs). Using selectors with the `_find` endpoint as described is utterly wrong. – RamblinRose Jan 27 '21 at 21:45
  • Thank you for your response. Please understand that my example is overly simplified to eliminate superfluous details that might confuse the reader. In practice the query will always have at least one more filter, and appropriate indexes created accordingly. That being said, how would you suggest using _find, $or, and _id together with an index? – NiGHTS Jan 28 '21 at 03:09
  • Try using the `$in` operator with all `_id`s. Unfortunately without a *specific*, real example, this is question is going likely nowhere. I recommend editing your post by removing the images add adding the code for a real index and the code for a real query that reproduces your issue. – RamblinRose Jan 28 '21 at 13:15
  • Thank you again for your suggestion. While I raised my ebrow at the request to use an array operator on a discrete string object, I was surprised that it was allowed. Regardless, it did not help my situation. I've updated the post to remove images. As for adding more query parameters, how pray tell is adding more work for the database supposed to improve performance? I don't see the need to change my question because making this example work is enough for me. And I'm sure others would benefit from it's simplicity. – NiGHTS Jan 28 '21 at 16:29
  • That index doc is a map/reduce design document for a view, not a [mango index](https://docs.couchdb.org/en/stable/api/database/find.html#api-db-find-index). In Fauxton under the Mango query page, hit "Explain" and I believe you will see your query is using the _all_docs index. – RamblinRose Jan 28 '21 at 17:19
  • I had created the index exactly as described in the mango index document that you linked. I've updated my post to include the Explain Summary from Fauxton. Note that it tries to use the correct index, and yet chooses to misuse the index by testing the condition against all documents instead of doing two distinct optimized lookups (I assume). Is this just evidence of a poor CouchDB search algorithm or am I doing something wrong? – NiGHTS Jan 28 '21 at 17:45
  • Ah I misread; I thought that was your initial declaration document - that's a "raw" dump of the design document that was created. – RamblinRose Jan 28 '21 at 18:24
  • It seems you are wanting to perform an [SQL select-where-in with CouchDB](https://stackoverflow.com/questions/12763430/performing-a-where-in-query-in-couchdb). Consider leveraging the `keys` property in a straight up view query against your index. Not sure why `keys` isn't an option for Mango queries (AFAIK!). – RamblinRose Jan 28 '21 at 22:24
  • Views are not an option for me in my application. The variety and complexity of queries vary considerably and I can't make a custom view for every possible input scenario (I expect Mango to pick up the slack where an index could exist but doesn't yet). One constant is having a subset of _ids to select which should improve performance in such cases. I appreciate your help but I will need to come to terms with this shortcoming of Mango perhaps until this situation is addressed by the developers. – NiGHTS Jan 29 '21 at 01:57

0 Answers0