9

My issue is that I would like to connect a Google Colab instance with a Gitlab project, but neither SSH nor HTTPS seem to work. From the error messages, I suspect setting-related issues in Colab. Maybe I have to allow Colab to connect to Gitlab and put it on a whitelist somewhere?

Running the following shell commands from a Notebook in Colab while being in the '/content' directory

git config --global user.name "mr_bla"
git config --global user.email "bla@wbla.bla"
git clone https://gitlab.com/mr_bla/mr_blas_project.git

results in the following error messages:

Cloning into 'mr_blas_project'...
fatal: could not read Username for 'https://gitlab.com': No such device or address

I have generated SSH keys as I'm used to, but the SSH check

ssh -vvvT git@gitlab.com:mr_bla/mr_blas_project.git

fails, leading to the following error:

OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolving "gitlab.com:mr_bla/mr_blas_project.git" port 22
ssh: Could not resolve hostname gitlab.com:mr_bla/mr_blas_project.git: Name or service not known

Trying the SSH-way to clone a project doesn't work either:

git clone git@gitlab.com:mr_bla/mr_blas_project.git

results in:

Cloning into 'mr_blas_project'...
Host key verification failed.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

The Google Colab instance is running the following OS:

cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

I've checked, among many others, the following questions without success:

Jan Spörer
  • 314
  • 1
  • 3
  • 12
  • Hi were you able to connect your colab with gitlab? I want to upload my colab file on gitlab! Thanks – Chris_007 May 03 '20 at 04:13
  • Hi @Chris_007, no sorry, I ended up writing the project in a plain .py file and running everything locally. If your repository does not need to be private, maybe you can try mitra’s answer and make your GitLab repo public. And consider switching between SSH and HTTPS. – Jan Spörer May 04 '20 at 07:07

5 Answers5

10

Here is the workflow I follow to persistently version control with GitLab my Google Colab notebooks (with GitHub I guess it would be quite similar).

I use Personal Access Tokens from GitLab in order to be able to use them in private repositories

WorkFlow

  • Create a Personal Access Token in GitLab

    • From Edit Profile/User Settings go to Access Tokens
      • Then enter a name (you will have to use it later) for the token and and optional expiry date
      • Select desired scopes:
        • read_repository: Read-only (pull) for the repository through git clone
        • write_repository: Read-write (pull, push) for the repository.
      • Press Create personal access token
      • Save the personal access token somewhere safe. After you leave the page, you no longer have access to the token.
  • Then in order to Colab to interact with GitLab you have to store the .git folder of the repository in a Google Drive Folder so it is persistent between Colab sessions

    • Let's say that you have a folder in Gdrive with some files you want to version control with Git:

      • /RootGDrive/Folder1/Folder2
    • Mount GoogleDrive in the GColab container file system. Let's say you mount it on /content/myfiles within the Colab container File System. You have to execute in a notebook this lines (this outputs an URL you have to go to give OAuth2 access to your Google Drive to the Colab instance).In a cell just run:

      from google.colab import drive 
      drive.mount(/content/myfiles)
      
      • This mounts on the container File System the root folder of your Google Drive in /content/myfiles/MyDrive
    • Once mounted change directory executing a magic command with %cd (with !cd will not work, each shell command is executed in a temporary subshell so it is not persistent)

      %cd "/content/myfiles/MyDrive/Folder1/Folder2"
      !pwd
      
    • Once there you initialize the git repository (this is just the first time, due to the fact that all this is done in your Google Drive means that it is a repository that will persist between sessions, if not once you leave the Google Colab session it would be removed).

       !git init
      
      • This creates the .git folder within your Google Drive folder
    • Now you have to configure typical git parameters locally (so it is stored on the .git folder) needed when pushing/pulling (again this has to be done just the first time):

      !git config --local user.email your_gitlab_mail@your_domain.com 
      !git config --local user.name your_gitlab_name
      
    • Now add the remote using the PAT created before (again this is done just the first time):

      • Key Point: The remote URL format (it has to be over HTTPs) depends on weather the Gitlab project (repo) is under a group/subgroups or not:

        • Under a group (there could be /group/subgroup1/subgroup2/.../project.git or just /group/projec.git)

          !git remote add origin https://<pat_name>:pat_code>@gitlab.com/group_name/subgroup1/project_name.git
          
        • NOT Under a group

          !git remote add origin https://<pat_name>:pat_code>@gitlab.com/your_gitlab_username/project_name.git
          
    • Now the git repository is configured within the Google Drive Folder not just in the File System Container so you can pull/push besides all the usual git commands

      !git add .
      !git commit -m"First commit"
      !git push -u origin master
      

After this is done the first time now in order to keep "version controlling" with Git and GitLab (again I guess it is very similar with GitHub for the Groups feature of GitLab for me is quite valuable) the files in the MyDrive/Folder1/Folder2 you should create a notebook that mounts the Google Drive and the git commands you want while you edit the other files in the folder.

I would say the best way is to have a parametrized notebook that checks if this is the first time to do the git initialization and so on and if not to just add/commit/push to the GitLab repository.

Cloning

For just cloning into the Container FS (or into Google Drive if it is already mounted) it is just use the same remote explained above with git clone:

  • Under a group

      !git clone https://<pat_name>:<pat_code>@gitlab.com/group_name/project_name.git
    
  • NOT Under a group

      !git clone https://<pat_name>:<pat_code>@gitlab.com/gitlab_user_name/project_name.git
    

Edit: I am adding the notebook I have created so you can use it to interact between Colab and GitLab called Gitlab_Colab_Interaction.ipynb so you can use it directly from Colab:

Imports

import os
from pathlib import Path

Parameters

# Paths
container_folder_abspath = Path('/content/myfiles')
gdrive_subfolder_relpath = Path('MyDrive/Colab Notebooks/PathTo/FolderYouWant') # No need to scape the space with pathlib Paths
gitlab_project_relpath = Path('/group_name/subgroup1/YourProject.git')
# Personal Access Token
PAT_name = 'my_pat_name'
PAT_code = 'XXXX_PAT_CODE_XXXXX'

Mount Drive

from google.colab import drive
drive.mount(str(container_folder_abspath))


fullpath = container_folder_abspath / gdrive_subfolder_relpath # Path objects with the operator /
%cd $fullpath
!pwd

Initialization (or not)

initialization = True
for element in fullpath.iterdir():
    if element.is_dir():
        if element.name == '.git':
            initialization = False
            print('Folder already initialized as a git repository!')
    

gitlab_url = 'https://' + PAT_name + ':' + PAT_code + '@gitlab.com/' + str(gitlab_project_relpath)
if initialization:
    !git init
    !git config --local user.email your_gitlab_mail@yourmail.com
    !git config --local user.name your_gitlab_user
    !git remote add origin $gitlab_url # Check that PATs are still valid
    !echo "GitLab_Colab_Interaction.ipynb" >> ".gitignore" # To ignore this file itself if it is included in the folder

else:
    print("### Current Status ###")
    !git status
    print("\n\n### Git log ###")
    !git log

Git Commands

# Git Add
!git add *.ipynb # For example to add just the modified notebooks

# Git Commit
!git commit -m "My commit message"

# Git Push
!git push -u origin master # As of now Gitlab keeps using the name master 
Gonzalo Polo
  • 357
  • 3
  • 14
6

If it is a private repo. You could use a GitLab deploy token, or you could use a GitLab personal access token. You would then just

git clone https://<deploy_username>:<deploy_token>@gitlab.example.com/tanuki/awesome_project.git

Note you probably don't want above code with this sensitive <deploy_token> exposed in your notebook, you could hide it via putting it in a executable script mounted on your drive as an example or I think you can hide the code.

3

After having contacted some Google employees directly: The functionality described above is not yet available for Google Colab. I've tried a couple things in the meantime, nothing worked.

If anyone knows if and when this feature will be added, please let me know.

Jan Spörer
  • 314
  • 1
  • 3
  • 12
  • 2
    I am not sure but with the Personal Access Tokens I think you can do what you mention, I have added an answer in this regard – Gonzalo Polo May 15 '21 at 04:04
  • @GonzaloPolo Thank you! I've unchecked my answer and accepted yours. I did not try your answer but it looks very comprehensive and will therefore probably work. – Jan Spörer May 16 '21 at 21:50
1

Be sure to add "!" as a prefix to your command on Google Colab workspace like so: !git clone https://gitlab.com/mr_bla/mr_blas_project.git

  • Thank you for the hint. I did this already. The git commands are being executed, the problem is that they fail at a later stage. – Jan Spörer Mar 12 '21 at 07:39
0

Change the repo visibility to public (from private?) in gitlab. I understand it might not be always possible, but doing that solved the problem for me.

mitra
  • 27
  • 8