0

I am trying to scrape the site "https://shmoti.com" . But unfortunately node-fetch's fetch method is not at all getting the response. It works fine for other websites.

Here is my code

const fetch = require("node-fetch")
fetch('https://shmoti.com', options = {headers : {'User-Agent' : 'Mozilla/5.0'}}).then(res=>res.text()).then(res=>console.log(res)) 

The first promise which resolves the response object is pending all the time.

I have even tried having a User-Agent . I have set a timeout of 60 seconds . I can successfully scrape this site using scrapy library in python but with fetch method , it always times out.

Why is this happening ? How can I fix this ?

I can ping the website and open it in my browser too , but only the fetching from node is not working.

Natesh bhat
  • 12,274
  • 10
  • 84
  • 125

1 Answers1

0

I know this is old, but I had a similar issue. node-fetch worked on github, but not on another site I was wanting to access. wget worked fine, however. Using netcat, I analyzed the differences in headers between the fetch request vs wget's request. The difference was that fetch would request with the connection=close while wget would request with connection=keep-alive. Adding this to my fetch header got the job done.

{
  "Connection": "Keep-Alive"
}

See this post for more info on using netcat. Made it super easy to debug and I didn't previously know about this tool.

Kael Kirk
  • 324
  • 2
  • 9