I am writing a bash script that extracts pdf files from html and downloads it. Here is the line of code that extracts:
curl -s https://info.uqam.ca/\~privat/INF1070/ |
sed 's/.*href="//' |
sed 's/".*//' |
sed '/^[^\.]/d' |
sed '/\.[^p][^d][^f]$/d' |
sed '/^$/d' |
sed '/\/$/d'
Result:
./07b-reseau.pdf
./07a-reseau.pdf
./06b-script.pdf
./06a-script.pdf
./05-processus.pdf
./04b-regex.pdf
./181-quiz1-g1-sujet.pdf
./03b-fichiers-solution.pdf
./04a-regex.pdf
./03d-fichiers.pdf
./03c-fichiers.pdf
./03b-fichiers.pdf
./03a-fichiers.pdf
./02-shell.pdf
./01-intro.pdf
./01-intro.pdf
./02-shell.pdf
./03a-fichiers.pdf
./03b-fichiers.pdf
./03b-fichiers-solution.pdf
./03c-fichiers.pdf
./03d-fichiers.pdf
./04a-regex.pdf
./04b-regex.pdf
./05-processus.pdf
./06a-script.pdf
./06b-script.pdf
./07a-reseau.pdf
./07b-reseau.pdf
./181-quiz1-g1-sujet.pdf
It's working fine but I was wondering if there is a better way (always by using sed) to do this with less sed commands.
Thank you.