0

I have stored 2775 urls in my mlab database and then I take each URL down to get more information. All of the URL I store in an Array then pass it into a function to process .However, The code only run up to about 1700 urls and process it and then stop. Here is my code (sorry about the code, this is my first time using stackoverflow :

Product.find({}, (err, foundProducts) => {
  if (err) {
    console.log("err " + err);
  } else {
    foundProducts.forEach(function(foundProduct) {
      var updateProduct = service.updateTikiProduct(foundProduct.url);
    });
  }
});

updateTikiProduct: function(url) {
    const options = {
        url: url,
        json: true
    };
    request(options,
            function(err, res, body) {
                // SOME code to crawl data

                Product.findOneAndUpdate({
                    url: options.url
                }, {
                    $set: {
                        name: name,
                        brand: brand,
                        store: store,
                        location: location,
                        base_category: categoryType,
                        top_description: topDescription,
                        feature_description: featureDescription
                    }
                }, {
                    upsert: true,
                    new: true
                }, (err, createdProduct) => {
                    if (err) {
                        reject(err);
                    } else {
                        var currentDate = new Date();

                        if (!createdProduct.hasOwnProperty("price")) {
                            createdProduct.price.push({
                                current: currentPrice,
                                origin: originPrice
                            });
                            createdProduct.save();
                        } else if (createdProduct.hasOwnProperty("price") &&
                            createdProduct.price[0].date.getDate() != currentDate.getDate()) {
                            createdProduct.price.push({
                                current: currentPrice,
                                origin: originPrice
                            });
                            createdProduct.save();
                            console.log("Update price");
                        }
                        counter++;
                        console.log("url : " + options.url);
                        console.log("Created product " + counter + " success!");
                    }
                });
            }
Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
Quan Bui
  • 1
  • 1
  • 4
  • is there any error on the console when your code is stopping? – Yogesh.Kathayat Aug 10 '18 at 10:08
  • No, when the code is running, my memory is nearly full (about 90%~95%). But when it come up to about 1700 url the memory is return normally and the console is stop running. However, the console did not tell me that it stop working – Quan Bui Aug 10 '18 at 11:38

2 Answers2

0

i guess mongo have limits to get items from db, you should try findAll or https://stackoverflow.com/a/3705615/4187058

Simon Pasku
  • 539
  • 2
  • 4
  • 17
  • 1
    but collection with 2775 urls is very small. you can get collections with thousands of documents in find() easily. The link you have given is about limit to show the items in the mongodb shell – Yogesh.Kathayat Aug 10 '18 at 10:20
  • well, can you check the look at `foundProducts.length` i\`m not sure how to upgrade limited count of returned items in mongo but i\`m sure you should take a look on it, or maybe in your forEach throwed some Exeption, use try/catch there to avoid this – Simon Pasku Aug 10 '18 at 10:26
  • 1
    I have tried to print out the length of foundProducts and it exacly return me the number "2775". Thank for your support – Quan Bui Aug 10 '18 at 11:24
0

I think your code is not processing all the elements is because you are processing all the elements in parallel, which will stop processing at one time when the memory will get full.

foundProducts.forEach(function(foundProduct) {
    var updateProduct = service.updateTikiProduct(foundProduct.url);

});

what you should do is process them in series. you can use async await for that, do the following changes it will work :-

for(let foundProduct of foundProducts){
    var updateProduct = await 
        service.updateTikiProduct(foundProduct.url);
  };
Yogesh.Kathayat
  • 974
  • 7
  • 21
  • Thank for your support, but I have tried it before and it does not work in series too. I have searched for something about async/ await with "foreach" function and they told me it CAN NOT work in a normal way. – Quan Bui Aug 10 '18 at 11:22
  • you can use `for..of` in place of `foreach` it works with async await. i have updated the answer. if it is not working in series too, is it throwing any error? add `try...catch` block and see what is the error. – Yogesh.Kathayat Aug 10 '18 at 11:26
  • But how can "await" run without initializing "async" :v – Quan Bui Aug 10 '18 at 11:33
  • Obviously, it will not work without async you will have to add `async` in your function `Product.find({}, async (err, foundProducts) => {` – Yogesh.Kathayat Aug 10 '18 at 11:38
  • I have tried as you said but it has no change. However, I think that the problem is as what you said, the code is running parallel, But I don't know why when my code is still running, the node.js takes some network running (about 1MB~2MB), and the network suddenly drops to 0 Mbps and it did not work anymore – Quan Bui Aug 10 '18 at 11:50