0

My customer is all of a sudden experiencing problems with a HTML scraper job made with Node.js. I have circled in on the cause, and found that it's located in the Request module. That made me write a small test application, which solely gets the HTML of the given URL via the Request module. Like this:

var request = require('request');

request('https://www.politi.dk/da/ompolitiet/jobipolitiet/ledige_stillinger/ledigestillinger', function(err, res, body){
    if(err){
        console.log(err);
    } else {
        console.log('statusCode:', res.statusCode);
        console.log('statusMessage:', res.statusMessage);
    }
});

The above example does not work though, as I am getting the following error when running the application:

{ Error: socket hang up
    at TLSSocket.onHangUp (_tls_wrap.js:1137:19)
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:111:20)
    at TLSSocket.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  code: 'ECONNRESET',
  path: null,
  host: 'www.politi.dk',
  port: 443,
  localAddress: undefined }

However if I change the URL to any other URL it works and I get the following:

statusCode: 200
statusMessage: OK

I have tried passing other URL's on the politi.dk domain, which doesn't work either. Therefore I can conclude that there's a problem with this domain, when requesting pages via the Request module. The strange thing is just, that it worked up until recently. What can cause this problem? Can some changes in settings be made to the server of politi.dk, that is causing this now? I find it hard to find anything helpful on Google. I found the nodejs-what-does-socket-hang-up-actually-mean thread here on SO, which is the exact same problem. But the answers doesn't help me much.

Anyone?

RonRonDK
  • 425
  • 6
  • 22
  • Are you in or from Denmark? I'm not, but it looks like requests to domains with `.dk` go to the Denmark Hostmaster. Does that sound right? Maybe you just need to pass options to Request that would turn on the cookie jar so the Hostmaster can apply a cookie to your request. The option is `{ jar: true }`. You can pass it as the first parameter with the URL. – Max Baldwin May 14 '18 at 14:48
  • @MaxBaldwin yes I am in Denmark. I have tried passing `{ jar: true }` to the request, but it's still the same error. I can't see what cookies would have to do with simply scraping the HTML of a page though. But thank you for your input. – RonRonDK May 15 '18 at 07:35
  • Some domains will bounce you if they can't track you. Because you're trying to go through a hostmaster I thought that may be the case. Are you making this request from your local host? Maybe it is because you are trying to access an `https` domain from your local host or from an `http` domain? – Max Baldwin May 15 '18 at 14:38

0 Answers0