1

I am trying to download all useful files of a directory in an website (https://cgran.org/browser/projects/ucsb_jello). By 'useful file', I mean the files that I can see when I go to the directory (the cpp, python and other files). I don't want to download any html file.

I followed the suggestion of Using wget to recursively fetch a directory with arbitrary files in it and used the following command:

wget -r --no-parent --reject "index.html*" '--no-check-certificate' https://cgran.org/browser/projects/ucsb_jello'

Unfortunately, it downloads a lot of unnecessary files, i.e., files that I don't see in the website. Besides, it misses some important cpp files that were inside the sub-folders of the directory.

How can I download all the sub-folders and the associated files of that directory with a single wget command? Any feedback will be appreciated.

Thanks,

Nazmul

Community
  • 1
  • 1
Nazmul
  • 383
  • 3
  • 6
  • 17
  • What type of files are they downloading? I'm guessing css and maybe javascript. In which case you just need to add those to your reject list. Have you considered the answer to this question? https://stackoverflow.com/questions/8755229/how-to-download-all-files-but-not-html-from-a-website-using-wget?rq=1 – Kallmanation Nov 20 '17 at 16:49

0 Answers0