2

I'm trying to scrape a list in a site that leads to other pages that has the same formatting.

I was able to create a collection of all the a tags, but when I try to visit a collection of pages, the key I try to create with it doesn't get added in my returned object.

Here's an example of what I'm trying to do with stack overflow:

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x(['a@href'], 'title'),
}) (function(err, obj) {
    console.log(obj);
});

I'm expecting my obj.title to be a list of titles of all the a href pages, instead I just get an empty object.

However if I were to try just using the first a href then I get the title no problem.

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x('a@href', 'title'),
}) (function(err, obj) {
    console.log(obj);
});

Has anyone run into this problem before?

JoshChang
  • 43
  • 1
  • 7

2 Answers2

1

I ran into that problem before and my solution goes like this:

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x('a', [{links:'@href'}])
}) (function(err, obj) {
    obj.forEach(function(links.link) {
        x(links.link, "title")(function(err, data){
                console.log(data) // should print the title
        });
});

Let me know if you run into any problems.

sylvery
  • 161
  • 1
  • 7
  • Suppose obj has 50 hrefs . @sudo_mAniac Will obj.forEach(function(links.link) { x(links.link, "title")(function(err, data){ } make 50 get requests in a sequence .. ?? – Rohit Kumar Oct 09 '17 at 11:42
  • 1
    Hi @RohitasBehera, in my case I needed it to scan 96 links at a time and it did work for me. If it doesn't work for your case then you can, while populating obj, either break it into chunks or use the built in 'promise' helper to handle the load. Also, what happened with my case was the requests were sent in the order that they were sorted in my obj array. The response however, didn't come back in the order that they were sent. This may be due to server response times. So I later sorted them out using a key. Hope it answers your question? – sylvery Oct 09 '17 at 23:17
  • @sudo thanks for the reply :) I found https://www.npmjs.com/package/x-ray#crawling-to-another-site in the docs . Could you please verify if it does the same thing . i.e. crawling for a list of href's in an order crawl :) thanks – Rohit Kumar Oct 10 '17 at 04:49
  • @RohitasBehera yeah it does. But you will have to provide that array list of URLs for it to crawl. – sylvery Oct 20 '17 at 02:50
0

You could Use X-ray's Crawling to anoth site

var Xray = require('x-ray');
var x = Xray();

x("http://stackoverflow.com/", {
  main: 'title',
  image: x('#gbar a@href', 'title'), // follow link to google images 
})(function(err, obj) {
/*
Rohit Kumar
  • 1,777
  • 2
  • 13
  • 26