2

I am writing a chrome extension and I am trying to detect all images in a webpage.

I am trying in my JS code to detect all images on a webpage, and by all I mean:

  1. Images that are loaded once the webpage is loaded
  2. Images that are used as background (either in the CSS or inline html)
  3. Images that could be loaded after the webpage is done loading, for instance, when doing a google image search it is easy to find all images, but once you click on one image to make it bigger, this image is not detected. Same thing for browsing social media website.

The code that I have right now makes it easy to find the initial images (1). But I struggle with the other two parts (2) and (3).

Here is my current code in contentScript.js:

var images = document.getElementsByTagName('img');
for (var i = 0, l = images.length; i < l; i++) {
    //Do something
}

How should I modify it so that it actually can detect all other images (2 and 3).

I have seen a couple of questions on (2) on SO like this one or this one, but none of the answers seem to completely satisfy my second requirement and none of them is about the third.

LBes
  • 3,366
  • 1
  • 32
  • 66
  • Regarding Point 3. Couldn't you just do a setInterval() and check if any new images are in the DOM? – filip Sep 21 '18 at 13:31
  • @filip seems quite computationally heavy (especially if you want new images to be detected right away, which is one requirement I have). I was thinking more of something like catching events. Isn't there any event that I could use to know that something has been added to the DOM and just check what that something contains to see if there is an image? – LBes Sep 21 '18 at 13:33
  • 1
    Found something called MutationObserver, which checks for changes in the DOM (for example adding an tag) https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver @LBes – filip Sep 21 '18 at 13:38
  • 1
    Interesting @filip will give this a try when I'm back from work – LBes Sep 21 '18 at 13:41

3 Answers3

5

Live collection of imgs

To find all HTML images, as @vsync has said, is as simple as var images = document.images. This will be a live list so any images that are dynamically added or removed from the page will be automatically reflected in the list.

Extracting background images (inline and CSS)

There are a few ways to check for background images, but perhaps the most reliable way is to iterate over all the page's elements and use window.getComputedStyle to check if each element's backgroundImage does not equal none. This will get background images set both inline and in CSS.

var images = [];
var elements = document.body.getElementsByTagName("*");
Array.prototype.forEach.call( elements, function ( el ) {
    var style = window.getComputedStyle( el, false );
    if ( style.backgroundImage != "none" ) {
        images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "")
    }
}

Getting the background image from window.getComputedStyle will return the full CSS background-image property, in the form url(...) so you will need to remove the url( and ). You'll also need to remove any " or ' surrounding the URL. You might accomplish this using backgroundImage.slice( 4, -1 ).replace(/['"]/g, "")

Only start checking once the DOM is ready, otherwise your initial scan might miss elements.

Dynamically added background images

This will not provide a live list, so you will need a MutationObserver to watch the document, and check any changed elements for the presence of backgroundImage.

When configuring your observer, make sure your MutationObserver config has childList and subtree set to true. This means it can watch all children of the specified element (in your case the body).

var body = document.body;
var callback = function( mutationsList, observer ){
    for( var mutation of mutationsList ) {
        if ( mutation.type == 'childList' ) {
            // all changed children are in mutation.target.children
            // so iterate over them as in the code sample above
        }
    }
}
var observer = new MutationObserver( callback );
var config = { characterData: true,
            attributes: false,
            childList: true,
            subtree: true };
observer.observe( body, config );

Since searching for background images requires you to check every element in the DOM, you might as well check for <img>s at the same time, rather than using document.images.

Code

You would want to modify the code above so that, in addition to checking if it has a background image, you would check if its tag name is IMG. You should also put it in a function that runs when the DOM is ready.

UPDATE: To differentiate between images and background images, you could push them to different arrays, for example to images and bg_images. To also identify the parents of images, you would push the image.parentNode to a third array, eg image_parents.

var images = [],
    bg_images = [],
    image_parents = [];
document.addEventListener('DOMContentLoaded', function () {
    var body = document.body;
    var elements = document.body.getElementsByTagName("*");

    /* When the DOM is ready find all the images and background images
        initially loaded */
    Array.prototype.forEach.call( elements, function ( el ) {
        var style = window.getComputedStyle( el, false );
        if ( el.tagName === "IMG" ) {
            images.push( el.src ); // save image src
            image_parents.push( el.parentNode ); // save image parent

        } else if ( style.backgroundImage != "none" ) {
            bg_images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "") // save background image url
        }
    }

    /* MutationObserver callback to add images when the body changes */
    var callback = function( mutationsList, observer ){
        for( var mutation of mutationsList ) {
            if ( mutation.type == 'childList' ) {
                Array.prototype.forEach.call( mutation.target.children, function ( child ) {
                    var style = child.currentStyle || window.getComputedStyle(child, false);
                    if ( child.tagName === "IMG" ) {
                        images.push( child.src ); // save image src
                        image_parents.push( child.parentNode ); // save image parent
                    } else if ( style.backgroundImage != "none" ) {
                        bg_images.push( style.backgroundImage.slice( 4, -1 ).replace(/['"]/g, "") // save background image url
                    }
                } );
            }
        }
    }
    var observer = new MutationObserver( callback );
    var config = { characterData: true,
                attributes: false,
                childList: true,
                subtree: true };

    observer.observe( body, config );
});
jla
  • 4,191
  • 3
  • 27
  • 44
  • The first part of your answers states that "var iamges = document.images. This will be a live list so any images that are dynamically added or removed from the page will be automatically reflected in the list." I have that already and it doesn't work at all on newly loaded content (cf clicking an image on google image, or scrolling on social media) – LBes Oct 09 '18 at 13:35
  • @LBes interesting, the spec indicates it should be a live collection. `document.images` is quite an old standard, the HTML spec suggests `getElementsByTagName("img")` as another option. However using the `MutationObserver` solution and checking changed elements for `IMG` tag names as well as background images is likely to be the most robust and complete solution. – jla Oct 09 '18 at 13:56
  • 1
    which is what the second answer suggested, but see my comments there. – LBes Oct 09 '18 at 13:57
  • @LBes The error `is not of type 'Node'` means that the element to which you're attaching the observer does not exist when your code runs. As suggested in my answer use `document.addEventListener('DOMContentLoaded', function () {` to wait till the DOM loads; if the element loads after the DOM see https://stackoverflow.com/questions/40398054/observe-on-mutationobserver-parameter-1-is-not-of-type-node for a solution that polls until the element is ready. – jla Oct 09 '18 at 14:04
  • Just tried the code you suggested in my content_script.js of the chrome extension and just added a simple console.log() at the beginning of the addEventListener() function to say "DOM ready" and this is never called apparently @jla – LBes Oct 10 '18 at 14:27
  • @LBes I am unfamiliar with the intricacies of chrome extensions, https://stackoverflow.com/questions/43233115/chrome-content-scripts-arent-working-domcontentloaded-listener-does-not-execut indicates how to get content scripts to run when the DOM is loaded but you may have to see what works for you. In any case, the best modern way of determining changes to the DOM is definitely `MutationObserver`s, and your error indicates that you need some way of determining when the element to which you attach your observer is ready. – jla Oct 10 '18 at 23:48
  • alright with that pointers it seems to work much better. However I just tried that on google image and when scrolling to see further images I don't get any new images added to the list which is quite surprising. Works like a charm on facebook though apparently. Additional question, is there any way to modify the above code to get the DOM element containing the image instead of the src image. And then to differentiate later between an IMG and a background IMG when I want to access the src. – LBes Oct 11 '18 at 12:07
  • @LBes answer updated in the Code section to allow distinguishing between images, background images and image parent elements. That's interesting that Google images didn't trigger a mutation observer event. Was the `subtree` property in the `config` variable for the mutation observer set to true? I've noticed that Google image thumbnails are often base64 encoded rather than statically linked, but that shouldn't make any difference to the mutation observer event. – jla Oct 11 '18 at 13:38
  • 1
    thanks for the update. Yes it was correctly set to true... I also don't understand why it wouldn't work. Accepting the answer anyway as it is a very specific case, but I'd still like to figure it out. If you have ideas, feel free to send them my way :) – LBes Oct 11 '18 at 13:41
  • Please check your code's syntax again: https://stackoverflow.com/questions/58053728/ The first and third code blocks have many syntax errors. – CertainPerformance Sep 22 '19 at 22:08
2

For HTML images (which already exist by the time you run this):

document.images

For CSS images:

You would need to probably use REGEX on the page's CSS (either inline or external files), but this is tricky because you would need to dynamically build the full path out of the relative paths and that might not always work.

Getting all css used in html file


For delayed-loaded images:

You can use a mutation observer, like @filip has suggest in his answer

Community
  • 1
  • 1
vsync
  • 118,978
  • 58
  • 307
  • 400
  • 1
    Ok thanks for the pointers. +1. I had this fear that it would not always work indeed. – LBes Sep 21 '18 at 14:36
1

This should solve your 3. problem. I used a MutationObserver.

I check the targetNode for changes and add a callback, if a change happens.

For your case the targetNode should be the root element to check changes in the whole document.

In the callback I ask if the mutation has added a Node or not with the "IMG" tag.

    const targetNode = document.getElementById("root");

    // Options for the observer (which mutations to observe)
    let config = { attributes: true, childList: true, subtree: true };

    // Callback function to execute when mutations are observed
    const callback = function(mutationsList, observer) {
        for(let mutation of mutationsList) {
            if (mutation.addedNodes[0].tagName==="IMG") {
                console.log("New Image added in DOM!");
            }   
        }
    };

    // Create an observer instance linked to the callback function
    const observer = new MutationObserver(callback);

    // Start observing the target node for configured mutations
    observer.observe(targetNode, config);
filip
  • 628
  • 2
  • 8
  • 31
  • 1
    Ok seems like the way to go. Upvote! but I get a "Failed to execute 'observe' on 'MutationObserver': parameter 1 is not of type 'Node'." on line observer.observe(targetNode, config); – LBes Sep 21 '18 at 14:35
  • @LBes what did you set the targetNode var to? You can't just use document.body or something like that. I defined an id to the tag and then got it by document.getElementByID("root"); – filip Sep 21 '18 at 14:56
  • 1
    I did use document.body. But then what do you suggest using, not sure I understand what you mean – LBes Sep 21 '18 at 15:02
  • @LBes HTML: .... JS: targetNode = document.getElementById("root"); – filip Sep 21 '18 at 16:21
  • that cannot work for me. It's a chrome extension, it has to work on every page :s – LBes Sep 21 '18 at 16:30
  • Just tested with just document. Works without errors. const targetNode = document; @LBes – filip Sep 21 '18 at 17:22
  • don't you get errors at all? I've just tried in google images and I get "Cannot read property 'tagName' of undefined at MutationObserver.callback" – LBes Sep 21 '18 at 22:42
  • I tested it with my own site where I add images with a button. It works without errors. Did you try logging just the mutation? console.log(mutation); @LBes – filip Sep 22 '18 at 08:38
  • just tested again on google images. When I click on one image to get a bigger version of it (which adds an image to the DOM), the callback function is called multiple times and only one of these calls is valid (with one added node). All the other calls are with 0 nodes, hence the error I get I supposed. I'm trying to figure out why the callback is called many times. I tried filtering out the cases of no added nodes withif(mutation.addedNodes.length == 0){ return ; } but this is not good enough as it seems to also not notify me whenever I click on an image to enlarge – LBes Sep 22 '18 at 11:44
  • Wouldn't know how to fix this then, sorry @LBes – filip Sep 26 '18 at 17:34