0

I'm looking for simplification/encapsulation so my existing programs that use (sic) open("my_file.txt") can be ported to colaboratory with the minimum change in the existing logic flow. Happy to have some cut/paste logic before my existing logic.

The mental model I understand from google (here) is that I have to do these prerequisites to get my file loaded.

  1. upload to google drive
  2. download to python (vm, probably in /tmp)

And then I can execute my existing code w/o change.

Therefore the I suspect/propose that what works for me (but not just me!) would be an interface/function as follows:

  • inputs (from local computer)
    • source_file_dir
    • source_file_name
    • (of course authentication inputs are implicitly required)
  • output
    • python_vm_file_dir (dir I can use in my program; /tmp is fine)
    • (implicitly I expect the same dest_file_name)

With this code snippet, I code easily move code into colaboratory.

Has anyone already created this?

Thank you.

Tim Misner
  • 343
  • 3
  • 5

2 Answers2

2

I've been tackling similar questions. In terms of simplicity, I found that keeping data files in Google Cloud Storage the easyiest. It's quite well explained in the tutorial - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb

I've found the easiest thing to do is insert cells to copy data to the VM running the notebook

!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

That way I can generally leave the 'active' code blocks the same that I run locally.

I use a chromebook when I'm out and about, so like to keep as much in the cloud as possible. It's quite easy to set up a 'mapped network drive' (in windows speak) to a GCS bucket - for moving files around. It's also very easy on Linux. Windows, I found that this utility is really handy https://www.cloudberrylab.com/drive/google-cloud.aspx - Not an advert, I'm just a fan.

Peter Coghill
  • 381
  • 2
  • 13
0

Upload to Google Drive. Here is a code snippet to access it directly.

!apt-get install -y -qq software-properties-common python-software-properties 
module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret= 
{creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret= 
{creds.client_secret}

Now Create a drive directory

!mkdir -p drive
!google-drive-ocamlfuse drive

You can simply access any file present in google drive as drive/Filename

Eg.

df = pandas.read_hdf("drive/Colab Notebooks/S2C5_complete_cleaned_by_me_10percent.h5")

Also You only need to do this once for only one notebooks. After which you can access data in other notebooks as well.