0

I have the following structure to my Mongodb documents, and as you'll see, I have 3 URLs, each with crawled set to True or False.

{
    "_id": {
        "$oid": "573b8e70e1054c00151152f7"
    },
    "domain": "example.com",
    "good": [
        {
            "crawled": true,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/11005-Cheap-booze!"
        },
        {
            "crawled": false,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/9445-This-week-s-voucher-codes"
        },
        {
            "crawled": false,
            "added": {
                "$date": "2016-05-17T21:34:34.485Z"
            },
            "link": "/threads/9445-This-week-s-voucher-codes_2"
        }
    ],

    "link_found": false,
    "subdomain": "http://www."
}

I'm trying to return specific fields where only those URL with crawled set to False are returned, for this I have the following query:

.find({'good.crawled' : False}, {'good.link':True, 'domain':True, 'subdomain':True})

However, what is returned vs what is expected is different as it's returning all the URLs, irrespective of whether they have a crawled status of True or False

What is returned is:

{
    u'domain': u'cashquestions.com',
    u'_id': ObjectId('573b8e70e1054c00151152f7'),
    u'subdomain': u'http://www.',
    u'good': [
         {
             u'link': u'/threads/11005-Cheap-booze!'
         },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes'
        },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes_2'
        } 
             ]
}

What is expected:

{
    u'domain': u'cashquestions.com',
    u'_id': ObjectId('573b8e70e1054c00151152f7'),
    u'subdomain': u'http://www.',
    u'good': [
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes'
        },
        {
             u'link': u'/threads/9445-This-week-s-voucher-codes_2'
        } 
             ]
}

How can I specify that only the links with crawled set to False is returned?

Ben
  • 5,024
  • 2
  • 18
  • 23
Adders
  • 665
  • 8
  • 29
  • 3
    Possible duplicate of http://stackoverflow.com/questions/3985214/retrieve-only-the-queried-element-in-an-object-array-in-mongodb-collection – undefined_variable May 18 '16 at 06:56

1 Answers1

0

You'll want to use the aggregation framework (this will work in MongoDB 3.0 and later):

db.yourcolleciton.aggregate([
    // optional: only those with at least one false
    {$match: {'good.crawled': false}}, 
    // get just the fields you need (plus _id)
    {$project: {good:1,  domain:1, subdomain: 1}},  
     // get each in a separate temporary document
    {$unwind: {'good': 1}},
     // limit to false
    {$match: {'good.crawled': false}}, 
    // undoes the $unwind
    {$group: {_id: "$_id", domain: {"$first": "$domain"}, 'subdomain' : {$first, '$subdomain'}, good: {"$push":"$good"}} 
])
Nic Cottrell
  • 9,401
  • 7
  • 53
  • 76