1

I have this collection in mongodb:

articles = {

{
        id: 102
        pagelinks: { 104, 105, 107, 108 },
        title: "page1"
},

{
        id: 104
        pagelinks: { 102, 205, 207, 108 },
        title: "page2"
},

...
}

I want to have this collection:

page_link_titles= {

{
        id: 102,
        links: { "page2", "page5", "page7", "page9"}            
},

{
        id: 104,
        links: { "page1", "page5", "page7", "page9" }            
},

...
}

I am using mapreduce in this way:

var map = function () {
                var output= {links:db.articles.find({id : {$in: db.articles.findOne({id:this.id}).pagelinks}}, {title: 1, _id:0})}
                    emit(this.id, output);
                };
var reduce = function(key, values) {
                var outs={ links:null}
                values.forEach(function(v){                    
                    if(outs.links ==null){
                        outs.links = v.links
                    }                     
                });
                return outs;
            };

db.articles.mapReduce(map,reduce,{out: 'page_link_titles'});

and I get this error:

    mapreduce failed: { 
        "errmsg" : "exception: ReferenceError: db is not defined near 't={links:db.articles.find({id: {$in: [d' (line2)", 
        "code" : 16722, 
        "ok" : 0
} at src/mongo/shell//collection.js:1224

Any suggestions?

Andi Keikha
  • 1,246
  • 2
  • 16
  • 37

2 Answers2

2
nickmilon
  • 1,332
  • 1
  • 10
  • 9
  • What is the difference between this example and mine? I didn't get your point. – Andi Keikha Apr 21 '15 at 19:24
  • 2
    my point is " The map function should not access the database for any reason" and the db.articles.find ... is trying to access the db. I don't know about this example or what version that was supposed to be running but read on the comments other people are getting same error as you get and he never explains how to avoid. Anyway this is not the way map reduce works in mongoDB. You can only reference current document with this.XXX and any variables you pass it with scope option. – nickmilon Apr 21 '15 at 19:33
  • 1
    hm I got it "Since mongo 2.4, this functionality was removed. To use this functionality, you have to install Mongo 2.2.3." :-) http://blog.knoldus.com/2013/02/03/joins-now-possible-in-mongodb-2-4/ – nickmilon Apr 21 '15 at 19:43
  • Any way even then this could not be feasible in a production environment since it it is very expensive to call find() for each and every document. If you want to do a join with MR you have to make 2 MR passes one for each collection and use of 'reduce' in second output. – nickmilon Apr 21 '15 at 19:52
1

You can try this code snippet:

Sample collection:

db.articles.insert([
    {
        _id: 102,
        pagelinks: [ 104, 105, 107, 108 ],
        title: "page1"
    },
    {
        _id: 104,
        pagelinks: [ 102, 205, 207, 108 ],
        title: "page2"
    },
    {
        _id: 105,
        pagelinks: [ 102, 205, 207, 104 ],
        title: "page3"
    },    
    {
        _id: 107,
        pagelinks: [ 105, 205, 207, 104 ],
        title: "page3"
    }
]);

The magic:

db.articles.find().forEach( function (doc){
    var obj = { 
        "_id": doc._id 
    };
    var links = [];
    doc.pagelinks.forEach( function (x){
        var result = db.articles.findOne({ "_id": x });       
        if(result && result.title)  links.push(result.title);            
    }); 

    obj["links"] = links.filter(function (value, index, self) { 
        return self.indexOf(value) === index;
    });
    db.page_link_titles.save(obj);       
});

The result:

db.page_link_titles.find();

/* 1 */
{
    "_id" : 102,
    "links" : [ "page2", "page3" ]
}

/* 2 */
{
    "_id" : 104,
    "links" : [ "page1" ]
}

/* 3 */
{
    "_id" : 105,
    "links" : [ "page1", "page2" ]
}

/* 4 */
{
    "_id" : 107,
    "links" : [ "page3", "page2" ]
}
chridam
  • 100,957
  • 23
  • 236
  • 235
  • I did this: stackoverflow.com/a/22739813/2949810 – Andi Keikha Apr 21 '15 at 20:50
  • 1
    db.articles.find().forEach( function (newArticle) { newArticle.pagelinks = db.articles.find( { "_id": { $in: newArticle.pagelinks }}, {title: 1, _id:0} ).toArray(); newArticle.text = null; db.page_linkTitles.insert(newArticle); } ); – Andi Keikha Apr 21 '15 at 20:50
  • With the one I did, all pagelinks are empty :(, I am trying your way. – Andi Keikha Apr 21 '15 at 21:11
  • @Andi Do you have a sample collection that I can test on from this end? I tested with the sample above and works just fine. – chridam Apr 22 '15 at 07:16
  • It's wikipedia collection. – Andi Keikha Apr 22 '15 at 17:22
  • @Andi I meant sample documents that have the same schema as above? – chridam Apr 22 '15 at 17:46
  • 1
    Okay it's very strange, it works with the example but not with the real data. Example of document: { "id": 502360, "pagelinks": 21650, 30869108, 22598261, ...}, "title": "Mitchell College"} and I found out that some of the ids in the pagelinks might not exist in articles. For example there is no page with id 22598261. It took me 2 days to find that out :P – Andi Keikha Apr 24 '15 at 15:34
  • But I see in your code that you check that! I am looking for other reasons. – Andi Keikha Apr 24 '15 at 15:41