It's perfectly easy to download all images from a website using wget.
But I need this feature on client-side, best would be in Java.
I know wget's source can be accessed online, but I don't know any C and the source is quite complex. Of course, wget has also other features which "blow up the source" for me.
As Java has a built-in HttpClient
, yet I don't know how sophisticated wget really is, could you tell me if it is hard to re-implement the "download all images recursively" feature in Java?
How is this done, exactly? Does wget fetch the HTML source code of the given URL, extract all URLs with the given file endings (.jpg, .png) from the HTML and downloads them? Does it also search for images in the stylesheets that are linked in that HTML document?
How would you do this? Would you use regular expressions to search for (both relative and absolute) image URLs within the HTML document and let HttpClient
download each of them? Or is there already some Java library that does something similar?