2

I am using Google Colab in combination with a Custom GCE VM based on the instructions here. I now need a way to retrieve files from the VM without using the Colab interface due to a bug described in this issue and this issue. I've reviewed the answers from this similar question about file storage on hosted instances, but I don't think it helps me in this case.

I've attempted to SSH into the machine to find files, but I can't locate the /content directory that I expect to see in root. After digging through the file system I found the /mnt/stateful_partition/var/lib/docker directory is using the amount of disk space I expect to reflect the size of the data with a file object called colab-vmdisk that looks promising. I'm not sure how to proceed, but given the file path I expect there's a docker-based solution here that I don't know.

s_go
  • 25
  • 5
  • Colab notebooks are stored in Google Drive https://research.google.com/colaboratory/faq.html#:~:text=Colab%20notebooks%20are%20stored%20in,Google%20Drive%20file%20sharing%20instructions. – John Hanley Jun 07 '23 at 23:30
  • I am specifically looking for the data created by Colab when using the custom GCE VM connection option, not the notebook itself. These are the instructions for connecting Colab to a custom VM https://research.google.com/colaboratory/marketplace.html – s_go Jun 09 '23 at 10:04

2 Answers2

1

@hidude562's answer is on point. It worked for me. I've been trying to figure out a method with good download speeds (as fast as when I was using gdrive with Colab on their hosted runtime)

Colab seems to be managing the entire thing within a docker container, as you rightly mentioned @s_go. It also explains how they keep the popular libraries updated right from the start, including the gdown library. I figured it's best to use gdown to download large files into Colab from Gdrive; as google doesnt let you mount your personal gdrive to Colab when using a custom GCE VM runtime, due to some authorisation blockers. This method downloads files into colab at full speed Google is capable of (I've seen upto ~500mbps)

Adding on, after extracting the file from the Docker file, I used FileZilla SFTP to download the file to my local. It was as fast as expected, direct download from the SSH was around ~100kbps for some reason, with FileZilla on the same VM I got download speeds of upto ~13mbps (my wifi dl bandwidth is about ~25mbps)

Hope this comment validates @hidude562's answer for other readers.!

Thank you for your question @s_go and your answer @hidude562!:)

Adrot
  • 11
  • 1
  • Thanks for validating! I've not been able to check the answer above, but will accept now. – s_go Jul 26 '23 at 13:32
0

Google Colab from GCE is in its own docker container as you found. If you want to access the files in the google colab session, run docker ps and copy the container id from the bottom row. As for copying a file over, do docker cp (your container id):/path/to/google/colab/folder/ /path/to/gce/

hidude562
  • 18
  • 5
  • 1
    Thanks for the answer @hidude562! I've moved to a different data analysis solution, so haven't been able to validate this answer. Based on the post by @Adrot I'm going to accept the answer. – s_go Jul 26 '23 at 13:32