Right now I am trying to scrape a website page but i want to also save the static files so when I try to reproduce the website I will be closer and closer to the original one.
So for an example if you go on Wikipedia, on any page and click as Save As it will download the HTML page and a folder with images, some CSS or PHP files. That folder is my focus.
So I search on the internet but couldn't find anything that helped me. So far I have a function that takes an array of URLs and saved the HTML code locally, but again I want the static files too.
private List<FileRelation> scrapUrls(List<String> urlsToScrap, String jobPath) throws IOException {
List<FileRelation> files = new ArrayList<>();
Document doc;
for (String url: urlsToScrap) {
try {
doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20040924 Epiphany/1.4.4 (Ubuntu)")
.timeout(5000)
.get();
String fileName = UUID.randomUUID() + ".html";
BufferedWriter writer =
new BufferedWriter(
new FileWriter(jobPath + "\\" + fileName)
);
writer.write(doc.html());
FileRelation file = new FileRelation(fileName, jobPath);
files.add(file);
} catch (Exception e) {
System.out.println("ScarpingService - " + e);
}
}
return files;
}