11

I have some documents saved in a collection (called urls) that look like this:

{
    payload:{
        url_google.com:{
            url:'google.com',
            text:'search'
        }
    }
},
{
    payload:{
        url_t.co:{
            url:'t.co',
            text:'url shortener'
        }
    }
},
{
    payload:{
        url_facebook.com:{
            url:'facebook.com',
            text:'social network'
        }
    }
}

Using the mongo CLI, is it possible to look for subdocuments of payload that match /^url_/? And, if that's possible, would it also be possible to query on the match's subdocuments (for example, make sure text exists)?

I was thinking something like this:

db.urls.find({"payload":{"$regex":/^url_/}}).count();

But that's returning 0 results.

Any help or suggestions would be great.

Thanks,

Matt

I159
  • 29,741
  • 31
  • 97
  • 132
Matt P
  • 163
  • 1
  • 1
  • 4

2 Answers2

6

It's not possible to query against document keys in this way. You can search for exact matches using $exists, but you cannot find key names that match a pattern.

I assume (perhaps incorrectly) that you're trying to find documents which have a URL sub-document, and that not all documents will have this? Why not push that type information down a level, something like:

{
  payload: {
    type: "url",
    url: "Facebook.com",
    ...
  }
}

Then you could query like:

db.foo.find({"payload.type": "url", ...})

I would also be remiss if I did not note that you shouldn't use dots (.) is key names in MongoDB. In some cases it's possible to create documents like this, but it will cause great confusions as you attempt to query into embedded documents (where Mongo uses dot as a "path separator" so to speak).

dcrosta
  • 26,009
  • 8
  • 71
  • 83
  • 1
    Thanks for your answer. Unfortunately, I don't have control over the design of the documents being inserted in to the collection :/ Also, I'm not sure what you mean when you say embedded document? Do you have a link? Is that the same as a subobject? – Matt P Sep 06 '11 at 01:59
  • 1
    I think it means the same as what you call a "subobject". "Embedded document" is a more data-centric term, whereas "subobject" is a more programming-centric term (for those languages where Mongo documents are expressed as a hierarchy of objects). If you can't change the structure of the documents in your collection, you could consider using map-reduce to rewrite them into a more queryable format (such as the one I proposed here) – dcrosta Sep 06 '11 at 12:55
5

You can do it but you need to use aggregation: Aggregation is pipeline where each stage is applied to each document. You have a wide range of stages to perform various tasks.

I wrote an aggregate pipeline for this specific problem. If you don't need the count but the documents itself you might want to have a look at the $replaceRoot stage.

EDIT: This works only from Mongo v3.4.4 onwards (thanks for the hint @hwase0ng)

db.getCollection('urls').aggregate([
    {
        // creating a nested array with keys and values
        // of the payload subdocument.
        // all other fields of the original document
        // are removed and only the filed arrayofkeyvalue persists
        "$project": {
            "arrayofkeyvalue": {
                "$objectToArray": "$$ROOT.payload"
            }
        }
    },
    {
        "$project": {
            // extract only the keys of the array
            "urlKeys": "$arrayofkeyvalue.k"
        }
    },
    {
        // merge all documents
        "$group": {
            // _id is mandatory and can be set
            // in our case to any value
            "_id": 1,
            // create one big (unfortunately double
            // nested) array with the keys
            "urls": {
                "$push": "$urlKeys"
            }
        }
    },
    {
        // "explode" the array and create
        // one document for each entry
        "$unwind": "$urls"
    },
    {
        // "explode" again as the arry
        // is nested twice ...
        "$unwind": "$urls"
    },
    {
        // now "query" the documents
        // with your regex
        "$match": {
            "urls": {
                "$regex": /url_/
            }
        }
    },
    {
      // finally count the number of
      // matched documents
        "$count": "count"
    }
])
Fabian
  • 546
  • 5
  • 14
  • it is worth to mention that this only works with mongo db v3.4.4 onwards...too bad that i am working with v3.2 – hwase0ng Jul 22 '21 at 07:30