0

What's wrong in this code ?

I wanna scrape multiple urls with x-ray package

when i run the function i get "{ title: [] }"

const Xray = require('x-ray');
const x = Xray();
const createCustomMedium = () => {
  const medium = { title: [] };
  const urls = [
    'https://medium.com./topic/javascript',
    'https://medium.com./topic/programming'
  ];
  urls.forEach(elem => {
    x(elem, {
      titles: ['article h4']
    })
      .then(articles => {
        medium.title.push(...articles.titles);
      })
      .catch(console.log);
  });
  return medium;
};
const scraped = createCustomMedium();
console.log(scraped);
  • 2
    Promises are concurrent. When you execute them, there is no guarantee on when they will finish executing. Given their async nature, anything after a concurrent operation, will NOT wait on the previous one and continue executing. So here, you return before your promise can resolve. You need to use either ```async, await``` or callbacks to execute the next operation. More here: https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call – sinanspd Feb 29 '20 at 16:54

1 Answers1

1

You might want to look into Promise.all and do something like this:

const createCustomMedium = () => new Promise((resolve, reject) => {
  const urls = [
    'https://medium.com./topic/javascript',
    'https://medium.com./topic/programming'
  ];

  Promise.all(urls.map(elem => x(elem, { titles: ['article h4'] })))
    .then(result => result.reduce((titles, articles) => titles.concat(articles.titles), []))
    .then(titles => resolve({ title: titles }))
    .catch(reject);
})

createCustomMedium()
  .then(scraped => console.log(scraped));

where this returns a list of promises from each call to x(..):

urls.map(elem => x(elem, { titles: ['article h4'] }))
  • What's the purpose of this code result.reduce((titles, articles) => titles.concat(articles.titles), []) – Adam Adam Feb 29 '20 at 17:38
  • to get a list of titles, basically a list of strings – paperball Feb 29 '20 at 17:49
  • @AdamAdam It flattens the list of lists of titles into one single list. Assuming _x(elem, { titles: ['article h4'] })_ returns a list of titles, then Promise.all would return a list of lists. If your target platform supports it you could instead use the ES2019 [Array.prototype.flat()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/flat) function like so: _titles = result.flat()_ – AlexNilsson Feb 29 '20 at 21:25
  • What if i want to get an object of containing 2 arrays { titles: [], links: [] } – Adam Adam Feb 29 '20 at 21:42