I've created an ugly one-liner which works, but I would like to make it simpler and easier for others to read. It is being used in a dockerfile which is used as a script to build an image that will be run with Docker.
curl -s -L http://www.nxfilter.org/|grep Download|sed -e 's/<a /\n<a /g'|;
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|;
xargs -n1 curl -s -L|grep zip|sed -e 's/<a /\n<a /g'|;
sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|;
grep -v dropbox|grep -v logon|grep -v cloud|grep zip
or without manual line breaks
curl -s -L http://www.nxfilter.org/|grep Download|sed -e 's/<a /\n<a /g'|sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|xargs -n1 curl -s -L|grep zip|sed -e 's/<a /\n<a /g'|sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'|grep -v dropbox|grep -v logon|grep -v cloud|grep zip
Step 1: visit nxfilter.org and follow redirects to get www.nxfilter.org/p2/index.html
Step 2: parse the homepage HTML for the URL for the Download page www.nxfilter.org/p2/?page_id=93 (it's a blog type site and the page could change in the future)
Step 3: parse the Download page HTML for the URL to nxfilter*.zip which is currently http://nxfilter.org/download/nxfilter-3.0.5.zip
Step 4: download as nxfilter.zip
Step 5: the Dockerfile continues executing commands to set up the environment where NxFilter will run in the final Docker container.
Surely there is a simpler way to get that URL for the .zip
Easiest way to extract the urls from an html page using sed or awk only