-1

So I found this website that has a chart that I want to download something from a grid in the chart (For example codename volteer and it's relative recovery image links)
How can I use wget and grep to download the latest image?

OctonalXX
  • 49
  • 9

1 Answers1

1

Firstly keep in mind that websites generally use HTML which belong to Chomsky Type-2, whilst regular expression (used among others in grep) are destined for working Chomsky Type-3 contraptions, therefore situation might arise when they will not suffice. For further discussion see Using regular expressions to parse HTML: why not?

Question which arise, will regular expression suffice for your use case? After inspecting website link to download are quite predictable, page source excerpt

<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13816.82.0_volteer_recovery_stable-channel_mp-v6.bin.zip">90</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13904.55.0_volteer_recovery_stable-channel_mp-v6.bin.zip">91</a>
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_13982.88.0_volteer_recovery_stable-channel_mp-v6.bin.zip">92</a> 
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_14092.77.0_volteer_recovery_stable-channel_mp-v8.bin.zip">93</a>    
<a href="https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_14150.87.0_volteer_recovery_stable-channel_mp-v8.bin.zip">94</a>

thus it is possible to use regular expression https[^"]*volteer[^"]*zip which will find URLs which starts with https contain volteer and ends with zip. I use [^"]* as " delimiter is used for <a>'s href value and I do not want to include them in match. You might do your task as follows

  1. use wget to download linked site as site.html
  2. use grep -o 'https[^"]*volteer[^"]*zip' to get urls
  3. use tail -1 to get last (latest) of them
  4. use wget to download it
Daweo
  • 31,313
  • 3
  • 12
  • 25