How to extract image urls using bash?

Question

I would like to extract the image URL from page's html code using bash commands and then download all images from that page. I am not sure whether it is possible, as sometimes they ae stored in folders which I wouldn't have access to. But is it possible to download them from the source code?

I have written this so far:

wget -O plik.txt $1 
grep *.jpg plik.txt > wget
grep *.png plik.txt > wget
grep *.gif plik.txt > wget
rm plik.txt```

You cannot parse HTML code from a Bash script. To parse any markup language, you need specific parsers (same for HTML, XML, SGML, JSON, YAML, INI and even CSV require specific parsers). — Léa Gris, Mar 20 '22 at 17:13
Try https://superuser.com/questions/1219455/how-to-download-all-images-from-a-website-using-wget or https://stackoverflow.com/questions/4602153/how-do-i-use-wget-to-download-all-images-into-a-single-folder-from-a-url — MichalH, Mar 20 '22 at 17:14

dan · Answer 1 · 2022-03-20T20:15:57.773

Using lynx (a text web browser) in non-interactive mode, and GNU xargs:

#!/bin/bash

lynx -dump -listonly -image_links -nonumbers "$1" |
grep -Ei '\.(jpg|png|gif)$' |
tr '\n' '\000' |
xargs -0 -- wget --no-verbose --

This will start downloading matching image URLs in the web page URL given in $1, straight away.
It will include both images in the page, and images that are linked. Removing -image_links will skip images on the page.
You can add/remove whichever extensions you want to download, following the pattern I provided for .jpg, .png, and .gif. (grep -i is case insensitive).
The reason for using null delimiters (via tr) is to use xargs -0, which will avoid problems with URLs which contain a single quote/apostrophe (').
The --no-verbose flag for wget just simplifies the log output. I find it easier to read if downloading a large list of files.
Note that regular GNU wget will handle any duplicate filenames, by appending a number (foo.jpg.1 etc). However, busybox wget for example just exits if a filename exists, abandoning further downloads.
You can also modify the xargs to just print a list of the files to be downloaded, so you can review it first: xargs -0 -- sh -c 'printf "%s\n" "$@"' _

Thanks, this is exactly what I need. Just note that if you get paths with query parameters like `.../image.jpg?size=medium` you should remove the `$` from the `grep` portion — Sridhar Sarnobat, Aug 17 '23 at 17:23

How to extract image urls using bash?

1 Answers1