0

Im doing a machine learning project in google colab. Each time an instance is started, I want to run these commands:

  ! mkdir ~/.kaggle # make directory ".kaggle"
  ! cp kaggle.json ~/.kaggle/ # copy the json file into the directory
  ! chmod 600 ~/.kaggle/kaggle.json # allocate required permission for the file
  ! kaggle datasets download -d alessiocorrado99/animals10 # download animal set
  ! unzip animals10.zip

These commands download and extract a dataset I need. However, it only needs to be ran the first run through only. When clicking "Run All" after the initial download of the dataset, it requires user input to decide whether to replace the files or not. I also don't want to keep downloading from kaggle and use resources unnecesarily.

My current approach is to run the script once then comment out the initialization script, but this takes time and effort.

How can I automate this process so a certain cell only runs on the first run of the runtime?

Jacob Waters
  • 307
  • 1
  • 3
  • 11

2 Answers2

0
  1. After initial execution of your commands create a dummy variable.

  2. If you happen to re-execute that code cell an IF will check if that variable is in memory:

    import os
    
    INIT_SHELL_COMMAND_LIST = [
      'mkdir ~/.kaggle # make directory ".kaggle"',
      'next_cmd',
      ]
    if not 'INIT_HAPPENED' in locals().keys():
        for command in INIT_SHELL_COMMAND_LIST:
            os.sys(command)
        INIT_HAPPENED = True
    
Anton Frolov
  • 112
  • 1
  • 9
0

This answer is specific to my problem but I hope this helps others.

Essentially the code that I ran had some measureable effect, in my case it creates a directory called 'raw-img'. In order to detect if the code ran, I used a conditional testing for the existance of the file path.

import os

if not os.path.isdir('raw-img'):
  print("firstRun")
  ! pip install kaggle # install kaggle library
  ! mkdir ~/.kaggle # make directory ".kaggle"
  ! cp kaggle.json ~/.kaggle/ # copy the json file into the directory
  ! chmod 600 ~/.kaggle/kaggle.json # allocate required permission for the file
  ! kaggle datasets download -d alessiocorrado99/animals10 # download animal set
  ! unzip animals10.zip

However, in any case, whether importing libraries, or really anything else, there should be some code you can write to test if that operation has occured. So simple use that test to determine whether the code should run again.

For example, an answer here shows how to test if a library is installed. https://stackoverflow.com/a/55194886/9053474 If the string is empty, then you know your library has not been installed and so you should run the script.

Jacob Waters
  • 307
  • 1
  • 3
  • 11