So I found this website that has a chart that I want to download something from a grid in the chart (For example codename volteer and it's relative recovery image links)
How can I use wget and grep to download the latest image?

- 49
- 9
-
Please read [ask] then try again. – Ed Morton Jan 26 '22 at 12:55
1 Answers
Firstly keep in mind that websites generally use HTML which belong to Chomsky Type-2, whilst regular expression (used among others in grep
) are destined for working Chomsky Type-3 contraptions, therefore situation might arise when they will not suffice. For further discussion see Using regular expressions to parse HTML: why not?
Question which arise, will regular expression suffice for your use case? After inspecting website link to download are quite predictable, page source excerpt
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13816.82.0_volteer_recovery_stable-channel_mp-v6.bin.zip">90</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13904.55.0_volteer_recovery_stable-channel_mp-v6.bin.zip">91</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13982.88.0_volteer_recovery_stable-channel_mp-v6.bin.zip">92</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_14092.77.0_volteer_recovery_stable-channel_mp-v8.bin.zip">93</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_14150.87.0_volteer_recovery_stable-channel_mp-v8.bin.zip">94</a>
thus it is possible to use regular expression https[^"]*volteer[^"]*zip
which will find URLs which starts with https
contain volteer
and ends with zip
. I use [^"]*
as "
delimiter is used for <a>
's href
value and I do not want to include them in match. You might do your task as follows
- use
wget
to download linked site assite.html
- use
grep -o 'https[^"]*volteer[^"]*zip'
to get urls - use
tail -1
to get last (latest) of them - use
wget
to download it

- 31,313
- 3
- 12
- 25