0

I'm trying to create a webpage that displays JSON content. Rather than writing a JSON file with hundreds of entries by hand, I want to load an html given a url and convert its contents into a JSON file.

I'm very new to javascript and jquery, so I'm doing some practice webpages to reinforce what I've learned. For this practice project I want to access this webpage: http://dogtime.com/dog-breeds, traverse and display some of the elements from its contents. What I'm stuck on is how to retrieve an html from a given url.

I'm currently trying this code:

//When the document is ready
$(document).ready(function() {
    //Use ajax to load this webpage
    $.get("http://tired.com/", function(data) {
        //Load its data into the data variable
        var data = $(data);
        //Put the webpage into the variable with id "div"
        $("#div").html(data);
    });
})

But in the console I'm getting the error:

"XMLHttpRequest cannot load http://tired.com/. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access."

I did some reading on this post: "No 'Access-Control-Allow-Origin' header is present on the requested resource" but I didn't really understand how to get a solution from it. Some possible solutions that I gathered could be:

  1. In Windows, paste this command in run window:

    chrome.exe --user-data-dir="C:/Chrome dev session" --disable-web-security

This seems like a band-aid fix that won't work long-term.

  1. Use CORS: http://www.html5rocks.com/en/tutorials/cors/

Does this only work if both the client and server supports CORS? I also couldn't understand where to put/how to use this code because only snippets of functions are shown, and the example doesn't seem to work.

  1. Download the HTML page(s) and parse them.

Again, this seems like a fix that avoids the problem.

This is the entirety of my code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <link href="bootstrap-3.3.6-dist/css/bootstrap.min.css" rel="stylesheet">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
    <!--JSON file where I'll be storing some content-->
    <script src="breeds.js"></script>
</head>

<script>
    //When the document is loaded
    $(document).ready(function() {
        //Use Ajax to load the webpage
        $.get("http://tired.com/", function(data) {
            //Load the webpage into the data variable
            var data = $(data);
            //Load the html from the webpage into the element with id "div"
            $("#div").html(data);
        });
    })
</script>

<body>
    <div id="div"></div>​
</body>
</html>

I would greatly appreciate an explanation how to to make this code work. Thank you!

EDIT: So I'm using Python's BeautifulSoup to create my JSON file, but I can't get javascript to read it using:

$.getJSON("breeds.json", function(json) {
            console.log(json);
}) 

because it causes the same XMLHttpRequest error as before. I've verified that my JSON files are being created correctly by using http://www.freeformatter.com/json-validator.html. The only solution I can find is the hacky approach of changing the json file to a js file, and converting the json content into a global such as:

breeds = '{"dogBreeds": [{"size": "1", "shedding": "1", "link": "http://dogtime.com/dog-breeds/affenpinscher", "energy": "4", ....."Yorkshire Terrier", "intelligence": "3"}]}'

Which I can then read using:

window.onload = function() {
    var obj = JSON.parse(breeds);
    console.log(obj.dogBreeds[0].breedName);
}

Is there a better way to do this?

Community
  • 1
  • 1
Jaitnium
  • 621
  • 1
  • 13
  • 27
  • Do you have access to the servers that the URLs you're loading belong to? – nixkuroi Jul 26 '16 at 22:32
  • 2
    If you don't have access to the servers hosting the URL's you are requesting, I'd suggest using a server-side solution (Python's beautiful soup comes to mind) to grab the HTML from these pages, serialize it to JSON, then serve it to your client. Then your jQuery ajax call can request data from the same domain. – morecchia808 Jul 26 '16 at 22:38
  • What is you backend ? we will recommend u something – Abdennour TOUMI Jul 27 '16 at 02:32
  • @nixkuroi I don't have any special privileges for the URLs that I'm loading from (tired.com/dogtime.com). – Jaitnium Jul 27 '16 at 17:21
  • @morecchia808 That is correct, I don't have access to the servers hosting the URL's I'm requesting. I'll look into Python's soup. Thank you! :) – Jaitnium Jul 27 '16 at 17:21
  • @Abdennour TOUMI Not sure what constitutes as a backend, but I'm testing my webpage locally right now. – Jaitnium Jul 27 '16 at 17:21
  • Ok ... you have alternative to make a proxy in your backend : `http://localhost/proxy?url=http://tired.com` – Abdennour TOUMI Jul 27 '16 at 17:28
  • @Jaitnium : take a look to https://www.npmjs.com/package/after-load – Abdennour TOUMI Jul 27 '16 at 17:29
  • @morecchia808 I've been playing around with beautiful soup and wrote a python script to get all the information I need and write it to a JSON file. The problem I'm having now is loading the JSON data from javascript. Please check my original edited post. – Jaitnium Jul 27 '16 at 21:49
  • where/how are you executing your javascript? localhost? Or are you opening your html file in the browser via File -> Open / Ctrl + O? – morecchia808 Jul 27 '16 at 23:48
  • @morecchia808 I'm executing my javascript via localhost (just dragging it to the browser). – Jaitnium Jul 28 '16 at 00:34
  • Please see my answer below. You need to run a server. – morecchia808 Jul 28 '16 at 01:11

2 Answers2

1

CORS need to be enabled on the server. If the server does not set it, your browser will complain about requesting resources from another origin. That's your issue, one origin is "tired.com" and the other origin is the webserver that serves your HTML page.

You need to understand that this is a very important feature for your own security. Disabling CORS, as you suggested by starting Chrome with that certain parameter would make your code run, but is a huge security breach at the same time. Furthermore, it would work only for those who start their browsers with that option, which is probably no one but you :)

If you do not have the option to set the CORS header on the server side, you are screwed. However, you might find a way to load the data from within another environment that don't care about CORS, e.g. from a server (see the proposal from morecchia808). You are not lost yet :)

Jan B.
  • 6,030
  • 5
  • 32
  • 53
1

As you mentioned, the solution is to parse the remote html (Beautiful Soup is great for this) and serialize it to JSON on the server.

One last thing: You will continue to get the same "No 'Access-Control-Allow-Origin' header is present on the requested resource" error if you are opening the "index.html" file directly in the browser. You need to serve your webpage on a server, or just run localhost. Since you are already using Python, the easiest way to do this is to open a command prompt, cd into the directory where you saved your html file, and run this command:

    $ python -m SimpleHTTPServer

Then open http://localhost:8000 in your browser. The json should load just fine.

morecchia808
  • 456
  • 6
  • 10