5

I just start to using java script and I want to fetch metadata from the URL ... when enter any URL into the input field ,it has to pull meta data from it, this is the basic usage using in html java-script when executing code throwing error

I am searching any alternatives to this, but nothing helps. Please provide any idea how to achieve the functionality.

<!DOCTYPE html>
    <html>
    <body>
    <head>
      <meta name="description" content="Free Web tutorials">
      <meta name="keywords" content="HTML5,CSS,JavaScript">
      <meta name="author" content="John Doe">
      <meta content="http://stackoverflow.com/favicon.ico">
    </head>
    
    <p>Click the button to return the value of the content attribute of all meta elements.</p>
    
    <button onclick="myFunction()">Try it</button>
    
    <p id="demo"></p>
    
    <script>
    function myFunction() {
        var x = "https://www.amazon.in/"
      // var x = document.getElementsByTagName("META");
      var txt = "";
      var i;
      for (i = 0; i < x.length; i++) {
        txt = txt + "Content of "+(i+1)+". meta tag: "+x[i].content+"<br>";
      }
      
      document.getElementById("demo").innerHTML = txt;
    }
    </script>
    
    </body>
    </html>
User 28
  • 4,863
  • 1
  • 20
  • 35
ats demo
  • 61
  • 1
  • 1
  • 7

2 Answers2

5

I guess you are trying to build metadata scraper using javascript, if not wrong.
You need to take into consideration CORS policy before proceeding further, while requesting data from any url.

Reference URL:

  1. https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
  2. https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors

JSFiddle: http://jsfiddle.net/pgrmL73h/

Have demonstrated, how you can fetch the meta tags from URL given. For demo purpose, I have used https://jsfiddle.net/ url for fetching the meta tags, you can change it as per your need.

Followed below steps to retrieve the META tags from website.

  1. For retrieving page source from any website url, first you need to access that website. Using jquery AJAX method you can do it.
    Reference URL: https://api.jquery.com/jquery.ajax/

  2. Used $.parseHTML method from jQuery which helps to retrieve DOM elements from html string.
    Reference URL: https://api.jquery.com/jquery.parsehtml/

  3. Once the AJAX request retrieves page source successfully, we are checking each DOM element from the page source & filtered the META nodes as per our need & stored the data inside a "txt" variable.

E.G.: Tags like keyword, description will be retrieved.

  1. Once the AJAX request completed, we are displaying the details of the variable "txt" inside a paragraph tag.

JS Code:

function myFunction() {
  var txt = "";
  document.getElementById("demo").innerHTML = txt;
  // sample url used here, you can make it more dynamic as per your need.
  // used AJAX here to just hit the url & get the page source from those website. It's used here like the way CURL or file_get_contents (https://www.php.net/manual/en/function.file-get-contents.php) from PHP used to get the page source.
  $.ajax({
      url: "https://jsfiddle.net/",
      error: function() {
        txt = "Unable to retrieve webpage source HTML";
      }, 
      success: function(response){
          // will get the output here in string format
          // used $.parseHTML to get DOM elements from the retrieved HTML string. Reference: https://api.jquery.com/jquery.parsehtml/
          response = $.parseHTML(response);
          $.each(response, function(i, el){
              if(el.nodeName.toString().toLowerCase() == 'meta' && $(el).attr("name") != null && typeof $(el).attr("name") != "undefined"){
                  txt += $(el).attr("name") +"="+ ($(el).attr("content")?$(el).attr("content"):($(el).attr("value")?$(el).attr("value"):"")) +"<br>";
                  console.log($(el).attr("name") ,"=", ($(el).attr("content")?$(el).attr("content"):($(el).attr("value")?$(el).attr("value"):"")), el);
              }
          });
      },
      complete: function(){
          document.getElementById("demo").innerHTML = txt;
      }
  });
}
Prasad Wargad
  • 737
  • 2
  • 7
  • 11
  • unable to get html source from url @Prasad Wargad – ats demo Feb 29 '20 at 04:20
  • Did you checked the jsfiddle, it's working & retrieving the meta tags. – Prasad Wargad Feb 29 '20 at 05:06
  • yes checked with other URL like www.amazon.in its not working throwing this unable to get html source from url – ats demo Feb 29 '20 at 10:11
  • Correct. I have mentioned in an answer about CORS, and given reference URL too. Due to browser settings you can't be able to fetch details of it. You can use any CORS extension and try again. But the answer provided here is the correct one as per question, I guess. CORS issue is the one that you need to tackle it by own. – Prasad Wargad Mar 01 '20 at 03:55
  • For bypassing CORS issue, you can add below mentioned extension in GOOGLE CHROME browser as "Moesif Orign & CORS Changer (https://chrome.google.com/webstore/detail/moesif-orign-cors-changer/digfbfaphojjndkpccljibejjbppifbc)". Keep that extension running and then after try fetching details from www.amazon.in it might work. – Prasad Wargad Mar 02 '20 at 05:35
  • This does not work for twitter . com, for example. – Sylar Oct 15 '21 at 06:02
0

You can use open-graph-scraper for this, see this answer for details.

fredrivett
  • 5,419
  • 3
  • 35
  • 48