0

I want to get list of directories from https for aria2c.

Since, as I know, unlikely to wget, there is no recurrent option in aria2c, I going to use the txt file as mentioned here

So I need the list of directories.

This is the target https.

I tried lftp but there were some cerificate errors.

It would be greatful to let me know how to get the txt file.
Thank you!

Jake
  • 65
  • 4
  • Try this: `curl https://physionet.org/files/mimic3wdb-matched/1.0/ | grep -o -P '(?<=">).*(?=/)' | grep -v '\.\.'` – SIMULATAN Mar 21 '23 at 09:06
  • Thank you for answering!!. The command give me only top-level (parental) directory (e.g /p00 , /p01 ...) what I want is all the filenames (e.g /p00/p000020/3544749_0001.dat ...) !! – Jake Mar 21 '23 at 09:13
  • 1
    Put it in a loop then, I'll see what I can do. – SIMULATAN Mar 21 '23 at 09:28

1 Answers1

1

Try this hacked together script.

function list_folder() {
    echo "Starting new run! $1"
    content=$(curl -s -L 'https://physionet.org/files/mimic3wdb-matched/1.0/'"$1")
    folders=$(echo "$content" | grep -o -P '(?<=">).*(?=/</a>)' | grep -v '\.\.')
    # files are all the entries that don't end with a `/`
    files=$(echo "$content" | grep -o -P '(?<=">).*[^/](?=<\/a>)')
    echo "FOLDERS: $folders"
    echo "FILES: $files"
    for folder in $folders; do
        list_folder "$1/$folder"
    done
}

list_folder

It'll recursively search all the files in the directory listing and print them. If you want to save the files into a file, just redirect $files into the file.

You can also try making it multi threaded by appending a & to the list_folder calls.

SIMULATAN
  • 833
  • 2
  • 17