3

I am developing a web scraper using JavaFX webview. For the scraping purpose, I don't need to have the images to be loaded. When the page is being loaded, Webkit spawns lots of UrlLoader thread. So I think it's better to have the images disabled, so I will save lots of system resources. Does anyone know how to disable automatic image loading in Webview?

John Vu
  • 163
  • 1
  • 3
  • 7
  • hey i think i also encountered the same case. But with a different one. My case is how to disable only certain images not shown instead of all images being disabled...? @John Vu – gumuruh Feb 19 '17 at 15:07

2 Answers2

4

Solution Approach

Define your own protocol handler for http and filter out anything with an image mime type or content.

URL.setURLStreamHandlerFactory(new HandlerFactory());

Sample Code

import javafx.application.Application;
import javafx.scene.Scene;
import javafx.scene.layout.StackPane;
import javafx.scene.web.*;
import javafx.stage.Stage;

import java.io.IOException;
import java.net.*;

public class LynxView extends Application {
    private static final String BLANK_IMAGE_LOC =
            "https://upload.wikimedia.org/wikipedia/commons/c/ce/Transparent.gif";
    public static final String WEBSITE_LOC =
            "http://fxexperience.com";
    public static final String IMAGE_MIME_TYPE_PREFIX =
            "image/";

    @Override
    public void start(Stage stage) throws Exception {
        WebView webView = new WebView();
        WebEngine engine = webView.getEngine();
        engine.load(WEBSITE_LOC);

        stage.setScene(new Scene(new StackPane(webView)));
        stage.show();
    }

    public static void main(String[] args) throws IOException {
        URL.setURLStreamHandlerFactory(new URLStreamHandlerFactory() {
            @Override
            public URLStreamHandler createURLStreamHandler(String protocol) {
                if ("http".equals(protocol)) {
                    return new sun.net.www.protocol.http.Handler() {
                        @Override
                        protected URLConnection openConnection(URL url, Proxy proxy) throws IOException {
                            String[] fileParts = url.getFile().split("\\?");
                            String contentType = URLConnection.guessContentTypeFromName(fileParts[0]);
                            // this small hack is required because, weirdly, svg is not picked up by guessContentTypeFromName
                            // because, for Java 8, svg is not in $JAVA_HOME/lib/content-types.properties
                            if (fileParts[0].endsWith(".svg")) {
                                 contentType = "image/svg";
                            }
                            System.out.println(url.getFile() + " : " + contentType);
                            if ((contentType != null && contentType.startsWith(IMAGE_MIME_TYPE_PREFIX))) {
                                return new URL(BLANK_IMAGE_LOC).openConnection();
                            } else {
                                return super.openConnection(url, proxy);
                            }
                        }
                    };
                }

                return null;
            }
        });

        Application.launch();
    }
}

Sample Notes

The sample uses concepts from:

The sample only probes the filename to determine the content type and not the input stream attached to the url. Though probing the input stream would be a more accurate way to determine if the resource the url is connected to is actually an image or not, it is slightly less efficient to probe the stream, so the solution presented trades accuracy for efficiency.

The provided solution only demonstrates locations served by a http protocol, and not locations served by a https protocol.

The provided solution uses a sun.net.www.protocol.http.Handler class which may not be publicly visible in Java 9, (so the solution might not work for Java 9).

The urlStreamHandlerFactory is a global setting for the JVM, so once it is set, it will stay that way (e.g. all images for any java.net.URL connections will be ignored).

The sample solution returns a blank (transparent) image, which it loads over the net. For efficiency, the image could be loaded as a resource from the classpath instead of over the net.

You could return a null connection rather a than a connection to a blank image, if you do so, the web view code will start reporting null pointer exceptions to the console because it is not getting the url connection it expects, and will replace all images with an x image to show that the image is missing (I wouldn't really recommend an approach which returned a null connection).

Community
  • 1
  • 1
jewelsea
  • 150,031
  • 14
  • 366
  • 406
4
 public URLStreamHandler createURLStreamHandler(String protocol) {
     if ("http".equals(protocol)) { 
         return new URLFortuneHandler(); 
     }
     else return null;
 }
}

public class URLFortuneHandler extends sun.net.www.protocol.http.Handler {
    protected URLConnection openConnection(URL url) throws IOException {
        String file = url.getFile();
        int mid= file.lastIndexOf(".");
        String ext = file.substring(mid+1,file.length());        
        if ("jpg".equals(ext) || "png".equals(ext)) 
            return somethinghere;
        else 
            return super.openConnection(url);
    }    
}
John Vu
  • 163
  • 1
  • 3
  • 7