I have the following structure to my Mongodb documents, and as you'll see, I have 3 URLs, each with crawled
set to True
or False
.
{
"_id": {
"$oid": "573b8e70e1054c00151152f7"
},
"domain": "example.com",
"good": [
{
"crawled": true,
"added": {
"$date": "2016-05-17T21:34:34.485Z"
},
"link": "/threads/11005-Cheap-booze!"
},
{
"crawled": false,
"added": {
"$date": "2016-05-17T21:34:34.485Z"
},
"link": "/threads/9445-This-week-s-voucher-codes"
},
{
"crawled": false,
"added": {
"$date": "2016-05-17T21:34:34.485Z"
},
"link": "/threads/9445-This-week-s-voucher-codes_2"
}
],
"link_found": false,
"subdomain": "http://www."
}
I'm trying to return specific fields where only those URL with crawled
set to False
are returned, for this I have the following query:
.find({'good.crawled' : False}, {'good.link':True, 'domain':True, 'subdomain':True})
However, what is returned vs what is expected is different as it's returning all the URLs, irrespective of whether they have a crawled
status of True
or False
What is returned is:
{
u'domain': u'cashquestions.com',
u'_id': ObjectId('573b8e70e1054c00151152f7'),
u'subdomain': u'http://www.',
u'good': [
{
u'link': u'/threads/11005-Cheap-booze!'
},
{
u'link': u'/threads/9445-This-week-s-voucher-codes'
},
{
u'link': u'/threads/9445-This-week-s-voucher-codes_2'
}
]
}
What is expected:
{
u'domain': u'cashquestions.com',
u'_id': ObjectId('573b8e70e1054c00151152f7'),
u'subdomain': u'http://www.',
u'good': [
{
u'link': u'/threads/9445-This-week-s-voucher-codes'
},
{
u'link': u'/threads/9445-This-week-s-voucher-codes_2'
}
]
}
How can I specify that only the links with crawled
set to False
is returned?