0

Given a list of colab notebooks how can I download the ipynb of each one of them using wget or curl?

This question explains how to download notebooks stored on gdrive, but what about notebooks stored on github or on colab directories (colab.research.google.com/notebooks/) or other sources?

LeoPret
  • 59
  • 1
  • 5

1 Answers1

1

There're 2 options I recommend, assuming all the target url are in a text file. Save the code to .sh file (e.g dlnb.sh) and all the urls in a text file (e.g list.txt) like

https://colab.research.google.com/notebooks/gpu.ipynb
https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_arbitrary_image_stylization.ipynb
https://colab.research.google.com/drive/1sVsoBd9AjckIXThgtZhGrHRfFI6UUYOo

tl;dr: I would recommend to use solution 2 which use gdown (just run pip install gdown). Since wget can't save notebook with url doesn't have its name. Then run bash dlnb.sh list.txt in terminal

1.wget and cat only. This one has one raw back, we only use wget so the link that doesn't have a name will be save as random_id_here.ipynb

dlnb.sh

grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }
cat $1 | while read line || [[ -n $line ]]; 
do 
    if [[ $line != *.ipynb ]]; then
        id=$(grabid "$line")
        wget -O $id.ipynb 'https://docs.google.com/uc?export=download&id='$id;
    else
        wget $line;
    fi;
done

I take this reg ex, which is egrep -o '(\w|-){26,}' and plug it in my function, which it will extract and return id from the link

grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }

assign id by calling grabid(), line is the url

id=$(grabid "$line")

then using while read line || [[ -n $line ]]; loop through each line and download it using wget, you can see the explantion of the while loop in the code here

wget -O $id.ipynb 'https://docs.google.com/uc?export=download&id='$id;

OR

2.A better solution by install gdown. This work similar as solution 1, but using gdown instead of wget

dlnb.sh

grabid() { fileid=$( echo "$1" | egrep -o '(\w|-){26,}' ); echo $fileid; }
cat $1 | while read line || [[ -n $line ]]; 
do 
    if [[ $line != *.ipynb ]]; then
        gdown $(grabid "$line");
    else
        gdown $line;
    fi;
done

If the url is not end with .ipynb if [[ $line != *.ipynb ]]; then gdown will grab the id $(grabid "$line"); and download it instead, while solution 1 will save the notebook as id_of_notebook.ipynb. gdown will save as its name instead.

Ù w Ú
  • 54
  • 2
  • This does not explain how to download colab links that do not have the id in the url. Like https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_arbitrary_image_stylization.ipynb – LeoPret Nov 14 '22 at 23:53
  • @LeoPret both options in my answer can download without id in their url, have you read my answer? The `else wget $line;` is for download "colab links that do not have the id in the url" that you talk about If you want to download the link (doesnt has id) manually just do e.g `wget your_link_here`. Please read my answer again, i have already include your example as a **.txt** file to test my code out. – Ù w Ú Nov 16 '22 at 00:26