1

I want to get the first image of an external webpage and then display it. I use a XMLHttpRequest to get the document from the webpage, then I search for the first image in that document then display it. But no image shows up. This is for a chrome app, not a web page/website. Here is my javascript:

var xhr = new XMLHttpRequest();
xhr.open('GET', 'https://ab.reddit.com/', true);
xhr.responseType = 'document';
xhr.onload = function(e) {
  var ext_doc = this.response;
  var img_src = ext_doc.getElementsByTagName("img")[0];
  var img_html = document.querySelector('#TestImage2');
  img_html.src = img_src.src;
};
xhr.send();
Hobbs2000
  • 311
  • 1
  • 6
  • 19
  • Try logging `img_src` to see what (if anything) you get. – Scott Marcus Nov 17 '16 at 21:18
  • I cannot check what is logged to the console because I am on a school administrated Chromebook, and that functionality is blocked. – Hobbs2000 Nov 17 '16 at 21:21
  • Client-side web scraping is not possible [due to security issues](http://stackoverflow.com/a/31626877/6941627). You'll have to go server-side with this. You can use PhantomJS for example, but there's a lot more alternatives. – Fabian Schultz Nov 17 '16 at 21:22
  • I was able to retrieve a direct image this way though, and I am able to parse the document into a string and display the html – Hobbs2000 Nov 17 '16 at 21:25
  • Well right now you have a typo, `scr` instead of `src`. Also, you should warn your school sysadmins that they forgot to lock down installing unpacked extensions.. – Xan Nov 17 '16 at 21:46
  • I have fixed the typo but no image shows up – Hobbs2000 Nov 17 '16 at 22:03

1 Answers1

1

I figured out the issue. I cannot directly set the image src to the retrieved url src of the external image from the external html document. I have to send another XMLHttpRequest for the newly found image scr url and retrieve it as a blob. Then set the image src to window.URL.createObjectURL(this.response). this.response being the image blob. I am not quite sure why it has to be done this way, probably for some security reason. I also put this into its own function. The pgURL parameter is the url of the web page for images to be retrieved. index is the index of the image wanted in the list of all the images on the web page. And display is the image html element to be changed.

function getImage(pgURL, index, display)
{
  var xhr = new XMLHttpRequest();
  xhr.open('GET', pgURL, true);
  xhr.responseType = 'document';
  xhr.onload = function(e) {
    var doc = this.response;
    var img_src = doc.getElementsByTagName("img")[index];
    var src = img_src.src;
    //Have to make a new XMLHttpRequest for the image found because img sources cannot be directly set
    var xhr2 = new XMLHttpRequest();
    xhr2.open('GET',src);
    xhr2.responseType = 'blob'; //Must make blob object of retrieved image
    xhr2.onload = function(e){
        display.src = window.URL.createObjectURL(this.response); //Finally set the src for the image
    };
    xhr2.send();

  };
  xhr.send();
}

REMINDER! This is for a chrome app, not a website.

Hobbs2000
  • 311
  • 1
  • 6
  • 19
  • Oh, right, Chrome App - I missed that the first time I read your question. Here's a full [documentation article](https://developer.chrome.com/apps/app_external) on the topic! (spoiler: it presents the very same solution) – Xan Nov 18 '16 at 16:35