1

I am trying to download all the images from this link. I want to download images from only the hydraulics section, so I used --no-parent and when I run the command

wget -r --no-parent -e robots=off --user-agent="Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0" -A png http://indiabix.com/civil-engineering/hydraulics/

it only downloads the index.html.

I searched this issue on the web, and Stack Overflow already has two questions:

but they do not help. I also started a bounty on the latter question, but I wonder if anyone can suggest a workaround in my case?

Community
  • 1
  • 1
Naveen
  • 7,944
  • 12
  • 78
  • 165

2 Answers2

1

Quite simple:

  • there are no images on the link you provided.

The tiny icons ("View Answer" etc.) are part of a CSS definition for the anchor (background-image). As per now, wget will not parse the external CSS and pick images from there.

With -A png wget will even stop at the first file (.html) since it doesn't match.

I've succeded downloading everything with

   lwp-rget --hier --nospace http://indiabix.com/civil-engineering/hydraulics/

The lwp CPAN perl packages need to be installed: zypper se libwww

Axel Amthor
  • 10,980
  • 1
  • 25
  • 44
  • But it should follow the links and switch to other sections and there are images inbetween the questions . For example, on the link I provided, it should visit and then select page 2 at the bottom menu or select different sections from the left sidebar – Naveen Aug 18 '14 at 11:01
0

The answer depends on knowing the path to the images folder, so that it can be added to the list of directories to be included (without the --include parameter the whole site will be fetched).

wget 'http://indiabix.com/civil-engineering/hydraulics/' --convert-links --adjust-extension --recursive --page-requisites --no-directories --directory-prefix=output --include '/civil-engineering/hydraulics','/_files/images'

Alf Eaton
  • 5,226
  • 4
  • 45
  • 50