I am looking for the easiest way to download the kaggle competition data (train and test) on the virtual machine using bash to be able to train it there without uploading it on git.
4 Answers
Fast-forward three years later and you can use Kaggle's API using the CLI, for example:
kaggle competitions download favorita-grocery-sales-forecasting

- 5,422
- 6
- 36
- 57
First you need to copy your cookie information for kaggle site in a text file. There is a chrome extension which will help you to do this. Copy the cookie information and save it as cookies.txt.
Now transfer the file to the EC2 instance using the command
scp -i /path/my-key-pair.pem /path/cookies.txt user-name@ec2-xxx-xx-xxx-x.compute-1.amazonaws.com:~
Accept the competitions rules and copy the URLs of the datasets you want to download from kaggle.com. For example the URL to download the sample_submission.csv file of Intel & MobileODT Cervical Cancer Screening competition is: https://kaggle.com/c/intel-mobileodt-cervical-cancer-screening/download/sample_submission.csv.zip
Now, from the terminal use the following command to download the dataset into the instance.
wget -x --load-cookies cookies.txt https://kaggle.com/c/intel-mobileodt-cervical-cancer-screening/download/sample_submission.csv.zip

- 1,524
- 12
- 16
-
1Chrome Extension [link](https://chrome.google.com/webstore/detail/cookietxt-export/lopabhfecdfhgogdbojmaicoicjekelh) is not working. – Ashish Jul 11 '18 at 10:41
-
It is working for me. Search for chrome cookies.txt export – Ernest S Kirubakaran Jul 11 '18 at 11:19
- Install cookies.txt extension on chrome and enable it.
- Login to kaggle
- Go to the challenge page that you want the data from
- Click on cookie.txt extension on top right and it download the current page's cookie. It will download the cookies in cookies.txt file
- Transfer the file to the remote service using scp or other methods
- Copy the data link shown on kaggle page (right click and copy link address)
- run
wget -x --load-cookies cookies.txt <datalink>

- 49
- 4