So, after 2 days of digging into the problem and learning about the URLConnection
class, I finally was able to come up with my own (presumably crude) implementation which I'd like to share here in case anyone with as little knowledge as I happens to stumble over this thread. Again, the basic idea was to place only specific file types in a cache, not everything. I chose to store jpg
, png
, gif
and js
files as I thought them to be the most load-heavy ones, though every other file format should be possible, too.
First things first: browser.getEngine().setUserDataDirectory(...)
definitely does NOT do the job. I still have no clue, what it is good for, but it's certainly not storing image files %)
Instead, what I did was basically creating 5 classes:
CachedResource
: Consists of a byte[]
array holding a resource's raw data and some meta info (header fields, lastModified)
ResourceCache
: Holds all currently cached resource objects.
MyHttpUrlConnection
(extends sun.net.www.protocol.http.HttpURLConnection
): A wrapper class that is responsible for retrieving the file that its given URL points to. It does all the network magic.
CachedUrlConnection
(extends java.net.URLConnection
): An (almost) empty implementation that already has all the data we need and only waits for the system to call it.
MyUrlConnectionHandler
(extends sun.net.www.protocol.http.Handler
): This class is registered at application start and decides when to use which URLConnection
(see below).
The ResourceCache
, CachedResource
and CachedUrlConnection
classes are fairly small and easy to write. I designed the resource cache to map the resource of a url with its corresponding CachedResource
object, thus: ConcurrentHashMap<URL, CachedResource>
plus a getter and an addResource(...)
function. I added some other stuff to it like storing files locally, but this leads off-topic.
I then implemented the CachedUrlConnection
class as follows:
public class CachedUrlConnection extends URLConnection {
private CachedResource resource;
private ByteArrayInputStream inputStream;
/* Constructors */
public CachedUrlConnection(URL url, CachedResource resource) throws IOException {
super(url);
this.resource = resource;
this.inputStream = new ByteArrayInputStream(resource.getByteData());
}
@Override
public void connect() throws IOException {
// No need to do anything.
}
/* Object Methods */
/* Getters and Setters */
@Override
public String getHeaderField(int index) { ... }
@Override
public String getHeaderField(String key) { ... }
@Override
public Map<String, List<String>> getHeaderFields() { ... }
@Override
public InputStream getInputStream() throws IOException {
return inputStream; // <---- Here, the system can grab the data.
}
}
When looking at the sourcecode of URLConnection (for example here), you will quickly notice that most of its method implementations are dummies that either return null
or throw an UnknownServiceException
.
This is important: I don't know exactly which of these you need to implement!
In order to find out, I used the MyHttpUrlConnection
class and added to almost every function
System.out.println("function xyz called!");
super.xyz();
but I was lazy and didn't check all of them. So far, everything seems to be working fine %)
The next class was MyHttpUrlConnection
. I am not 100% sure if I actually needed to overwrite the HttpURLConnection
class, but I did so anyway, because it has a protected
constructor that will implicitely be called with a new sun.net.www.protocol.http.Handler
. That handler would obviously not follow our http policy, so I just wanted to be certain (cf. sourcecode line 801). The class thus looks rather empty:
public class MyHttpUrlConnection extends HttpURLConnection {
protected MyHttpUrlConnection(URL url, Handler handler) {
this(url, null, handler);
}
public MyHttpUrlConnection(URL url, Proxy proxy) {
this(url, proxy, new MyUrlConnectionHandler()); // <--- No way sneaking around^^
}
protected MyHttpUrlConnection(URL url, Proxy proxy, Handler handler) {
super(url, proxy, handler);
}
public MyHttpUrlConnection(URL url, String host, int port) {
this(url, new Proxy(Proxy.Type.HTTP, InetSocketAddress.createUnresolved(host, port))); // Taken over from the HttpURLConnection sourcecode.
}
}
Now comes the most important part: the MyUrlConnectionHandler
. Again, check this thread on where to put it. All the class itself needs is to overwrite the openConnection(URL, Proxy)
function. Before posting my code, I will give you a run-through of what it does.
- If the given URL is a jpg, png, ... file:
- Use a
MyHttpUrlConnection
object to get the resource's last date of modification on the server. this should only call the header and NOT the entire resource. Otherwise, we wouldn't have won anything. Credits go to this thread. I am not completely sure, though, whether I'm closing the URLConnection
properly here. Better double-check if in doubt ;)
- If there is no resource in the cache OR the resource in the cache is out of date:
- Close the mini connection and open a "proper" one to download the whole thing.
- Create a new
CachedResource
object and add it to the cache.
- Close that new connection, too.
- Return a new
CachedUrlConnection
object that holds the data. This might seem a bit stupid, as we already have everything, but the function needs to return a URLConnection
.
- If there is any exception or we simply did not deal with a jpg, png, ... file, return a "default"
MyHttpUrlConnection
object to process the URL normally.
And the corresponding code looks as follows. Note that I used the Apache org.apache.commons.io.IOUtils
:
@Override
protected URLConnection openConnection(URL url, Proxy proxy) throws IOException {
try {
// Is this some resource that we'd like to cache?
if (ResourceCache.isCachableURL(url)) {
// Retrieve whatever is in the cache first.
ResourceCache cache = ResourceCache.getInstance();
CachedResource resource = cache.getCachedResource(url);
// Open a connection to the server to at least check for the last-modified field.
MyHttpUrlConnection conn = new MyHttpUrlConnection(url, this); // Don't use URL#openConnection to avoid looping!
conn.setRequestMethod("HEAD");
conn.connect();
long lastModified = conn.getLastModified();
// Did we get the last-modified value at all?
if (lastModified == 0) {
throw new Exception("No last-modified value could be read! \n\t" + url);
}
// Resource not cached or out of date?
if (resource == null || resource.getLastModified() < lastModified) {
conn = new MyHttpUrlConnection(url, this);
conn.connect();
InputStream input = conn.getInputStream();
byte[] data = IOUtils.toByteArray(input);
Map<String, List<String>> headerFields = conn.getHeaderFields();
IOUtils.closeQuietly(input);
resource = new CachedResource(url.getFile(), data, headerFields, lastModified); // I use url.getFile() to store the file on my hard drive.
cache.addCachedResource(url, resource);
}
return new CachedUrlConnection(url, resource);
}
} catch (Exception e) {
e.printStackTrace();
}
// Return the default HttpURLConnection in our wrapper class.
return new MyHttpUrlConnection(url, proxy, this);
}
One last thing: To be on the safe side, NEVER use the URL#openConnection
method inside the MyUrlConnectionHandler#openConnection
function. Writing it down this way makes it pretty obvious why, but today, it took me quite a while to figure out where the infinite loop was coming from %) Use the constructor and call connect()
instead.
I hope this will ever help anyone, otherwise it has been a good exercise for me ^^