26

This question is closely related to this one and I will consider the advice given with respect to schema design in a NoSQL context, yet I'm curious to understand this:

Actual questions

Suppose you have the following document:

    _id : 2      abcd
    name : 2     unittest.com
    paths : 4    
        0 : 3    
            path : 2     home
            queries : 4      
                0 : 3    
                    name : 2     query1
                    url : 2      www.unittest.com/home?query1
                    requests: 4

                1 : 3    
                    name : 2     query2
                    url : 2      www.unittest.com/home?query2
                    requests: 4

Basically, I'd like to know

  1. if it is possible to use MongoDB's positional $ operator (details) multiple times, or put differently, in update scenarios that involve array/document structures with a "degree of nestedness" greater than 1:

    { <update operator>: { "paths.$.queries.$.requests" : value } } (doesn't work)

    instead of "only" be able to use $ once for a top-level array and being bound to use explicit indexes for arrays on "higher levels":

    { <update operator>: { "paths.$.queries.0.requests" : value } }) (works)

  2. if possible at all, how the corresponding R syntax would look like.

Below you'll find a reproducible example. I tried to be as concise as possible.


Code example

Database connection

require("rmongodb")
db  <- "__unittest" 
ns  <- paste(db, "hosts", sep=".")
# CONNCETION OBJECT
con <- mongo.create(db=db)
# ENSURE EMPTY DB
mongo.remove(mongo=con, ns=ns)

Example document

q <- list("_id"="abcd")
b <- list("_id"="abcd", name="unittest.com")
mongo.insert(mongo=con, ns=ns, b=b)
q <- list("_id"="abcd")
b <- list("$push"=list(paths=list(path="home")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)
q <- list("_id"="abcd", paths.path="home")
b <- list("$push"=list("paths.$.queries"=list(
    name="query1", url="www.unittest.com/home?query1")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)
b <- list("$push"=list("paths.$.queries"=list(
    name="query2", url="www.unittest.com/home?query2")))
mongo.update(mongo=con, ns, criteria=q, objNew=b)

Update of nested arrays with explicit position index (works)

This works, but it involves an explicit index for the second-level array queries (nested in a subdoc element of array paths):

q <- list("_id"="abcd", paths.path="home", paths.queries.name="query1")
b <- list("$push"=list("paths.$.queries.0.requests"=list(time="2013-02-13")))
> mongo.bson.from.list(b)
    $push : 3    
        paths.$.queries.0.requests : 3   
            time : 2     2013-02-13

mongo.update(mongo=con, ns, criteria=q, objNew=b)
res <- mongo.find.one(mongo=con, ns=ns, query=q)
> res
    _id : 2      abcd
    name : 2     unittest.com
    paths : 4    
        0 : 3    
            path : 2     home
            queries : 4      
                0 : 3    
                    name : 2     query1
                    requests : 4     
                        0 : 3    
                            time : 2     2013-02-13


                    url : 2      www.unittest.com/home?query1

                1 : 3    
                    name : 2     query2
                    url : 2      www.unittest.com/home?query2

Update of nested arrays with positional $ indexes (doesn't work)

Now, I'd like to substitute the explicit 0 with the positional $ operator just like I did in order to have the server find the desired subdoc element of array paths (paths.$.queries).

AFAIU the documentation, this should work as the crucial thing is to specify a "correct" query selector:

The positional $ operator, when used with the update() method and acts as a placeholder for the first match of the update query selector:

I think I specified a query selector that does find the correct nested element (due to the paths.queries.name="query1" part):

q <- list("_id"="abcd", paths.path="home", paths.queries.name="query1")

I guess translated to "plain MongoDB" syntax, the query selector looks somewhat like this

{ _id: abcd, paths.path: home, paths.queries.name: query1 }

which seems like a valid query selector to me. In fact it does match the desired element/doc:

> !is.null(mongo.find.one(mongo=con, ns=ns, query=q))
[1] TRUE

My thought was that if it works on the top-level, why shouldn't it work for higher levels as well (as long as the query selector points to the right nested components)?

However, the server doesn't seem to like a nested or multiple use of $:

b <- list("$push"=list("paths.$.queries.$.requests"=list(time="2013-02-14")))
> mongo.bson.from.list(b)
    $push : 3    
        paths.$.queries.$.requests : 3   
            time : 2     2013-02-14

> mongo.update(mongo=con, ns, criteria=q, objNew=b)
[1] FALSE

I'm not sure if it doesn't work because MongoDB doesn't support this or if I didn't get the R syntax right.

Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120

3 Answers3

19

The positional operator only supports one level deep and only the first matching element.

There is a JIRA trackable for the sort of behaviour you want here: https://jira.mongodb.org/browse/SERVER-831

I am unsure if it will allow for more than one match but I believe it will due to the dynamics of how it will need to work.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • Great, thanks for the pointer!! So I guess it's a better design choice then to take it rather easy regarding arrays when higher degrees of nestedness come into play and rather rely on documents to implement such scenarios, right? That's also what's suggested by the top-answer to this question: http://stackoverflow.com/questions/4121666/updating-nested-arrays-in-mongodb – Rappster Feb 13 '13 at 15:37
  • @Rappster Exactly, remember that array handling for most operations is also in memory, i.e. $pull, $push and all that. So it is normally a good design choice to not go mad with them – Sammaye Feb 13 '13 at 15:42
  • The feature has been implemented in 2017 https://jira.mongodb.org/browse/SERVER-831 – FooBar Jun 30 '21 at 18:43
5

In case you can execute your query from the MongoDB shell you can bypass this limitation by taking advantage of MongoDB cursor's forEach function (http://docs.mongodb.org/manual/reference/method/cursor.forEach/)

Here is an example with 3 nested arrays:

var collectionNameCursor = db.collection_name.find({...});

collectionNameCursor.forEach(function(collectionDocument) {
    var firstArray = collectionDocument.firstArray;
    for(var i = 0; i < firstArray.length; i++) {
        var secondArray = firstArray[i].secondArray;
        for(var j = 0; j < secondArray.length; j++) {
            var thirdArray = secondArray[j].thirdArray;
            for(var k = 0; k < thirdArray.length; k++) {
                //... do some logic here with thirdArray's elements
                db.collection_name.save(collectionDocument);
            }
        }
    }
});

Note that this is more of a one time solution then a production code but it's going to do the job if you have to write a fix-up script.

Ivan Hristov
  • 3,046
  • 2
  • 25
  • 23
  • is the above operation atomic? – Maulik Soneji Mar 01 '17 at 09:29
  • Hi @MaulikSoneji, the operation is not atomic, as it is constituted of two different operations - read and write. How MongoDB will behave depends on the engine and the version you are running. Look here https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/#cursor-isolation I hope this info helps – Ivan Hristov Mar 01 '17 at 16:18
5

As @FooBar mentioned in the comments of the accepted answer, this feature was implemented in 2017 with MongoDB 3.6.

To do so, you must to use positional filters with arrayFilters conditions.
Applied to your example:

updateOne(
  { "paths.home": "home" },
  { $push : { 
      "paths.$.queries.$[q].requests": { time: "2022-11-15" } 
    }
  },
  { arrayFilters: [{ "q.name": "name" }] }
)

The postional operator $ refers to the filter { "paths.home": "home" }. Then, the positional filter $[q] refers to the arrayFilter { "q.name": "name" }.

Using this method, you can add as many positional filters as needed, as long as you put the condition in arrayFilters.

However, looking through the documentation of rmongodb, using arrayFilters is not possible at the moment. Alternatively, you could use another R package that has this feature implemented, such as Mongolite.

JulGio
  • 53
  • 2
  • 6