0

I am trying to do some simple webscraping in javascript to get the html code from either delish or tasty sites to store recipes. one example would be this site: https://www.delish.com/cooking/recipe-ideas/a27469586/baked-zucchini-recipe/ However when I use fetch I am not able to get it to work.

I actually wrote the equivalent code in python that I want to be able to translate into javascript. This is pasted here:

import requests
url = "https://www.delish.com/cooking/recipe-ideas/a27469586/baked-zucchini-recipe/"
r = requests.get(url)
text = str(r.content)

I am able to get the javascript code to work when I use a different site. For example this worked for me

fetch('https://api.github.com/users/maecapozzi')
   .then(res => console.log('response: ', res))
   .catch(console.error)

but when trying for my site I got an error:

Access to fetch at 'https://www.delish.com/cooking/recipe-ideas/a27469586/baked-zucchini-recipe' from origin 'http://localhost:3000' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled. index.js:1375 TypeError: Failed to fetch

I am not sure what exactly this means as I am pretty new to all this, so any help at all would be greatly appreciated!

mradey
  • 202
  • 1
  • 12
  • Take a look at https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS, since you are trying to fetch from the remote site, the browser doesn't allow you. For development purpose, you can use Chrome extension which disables CORS (Options http method. – RockLegend May 31 '19 at 19:29

2 Answers2

0

"Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin. A web application executes a cross-origin HTTP request when it requests a resource that has a different origin (domain, protocol, and port) than its own origin.

An example of a cross-origin request: The frontend JavaScript code for a web application served from http://domain-a.com uses XMLHttpRequest to make a request for http://api.domain-b.com/data.json.

For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts. For example, XMLHttpRequest and the Fetch API follow the same-origin policy. This means that a web application using those APIs can only request HTTP resources from the same origin the application was loaded from, unless the response from the other origin includes the right CORS headers."

From: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS

Basically server administrators implement CORS policies specifically to prevent what you're trying to do, which is access content from another site via code running on your site. Unless the server you want to access allows this kind of access, either via a non-restrictive CORS policy or a publicly accessible API like Github, then you won't be able to do what you're trying to do.

Stephen R. Smith
  • 3,310
  • 1
  • 25
  • 41
0

Browsers implement CORS for security reasons, and to prevent websites from scraping other websites client-side.

https://api.github.com explicitly allows contact from other websites by specifying the Access-Control-Allow-Origin: * header.

If you want to write web-scraping in Javascript for yourself and no one else, you might be able to disable CORS in your browser. If your website needs to scrape another website, you can write a Python (or similar) server to scrape the website for you and then re-host the content on a domain you control (sketchy, probably a bad idea).

Matthias
  • 648
  • 6
  • 18