Accessing '.pickle' file in Google Colab

Question

I am fairly new to using Google's Colab as my go-to tool for ML.

In my experiments, I have to use the 'notMNIST' dataset, and I have set the 'notMNIST' data as notMNIST.pickle in my Google Drive under a folder called as Data.

Having said this, I want to access this '.pickle' file in my Google Colab so that I can use this data.

Is there a way I can access it?

I have read the documentation and some questions on StackOverflow, but they speak about Uploading, Downloading files and/or dealing with 'Sheets'.

However, what I want is to load the notMNIST.pickle file in the environment and use it for further processing.

Any help will be appreciated.

Thanks !

How did you solve the issue? I have the same issue and cannot figure it out. Please help me if you can. Thanks. — user4704857, Feb 21 '19 at 04:10

score 8 · Answer 1 · answered Sep 28 '18 at 19:46

8

You can try the following:

import pickle
drive.mount('/content/drive')
DATA_PATH = "/content/drive/Data"
infile = open(DATA_PATH+'/notMNIST.pickle','rb')
best_model2 = pickle.load(infile)

answered Sep 28 '18 at 19:46

Bert Carremans

1,623
4
23
47

score 2 · Answer 2 · answered Mar 10 '18 at 07:56

The data in Google Drive resides in a cloud and in colaboratory Google provides a personal linux virtual machine on which your notebooks will run.so you need to download from google drive to your colaboratory virtual machine and use it. you can follow this download tutorial

score 2 · Accepted Answer · answered Apr 01 '19 at 15:11

Thanks, guys, for your answers. Google Colab has quickly grown into a more mature development environment, and my most favorite feature is the 'Files' tab.

We can easily upload the model to the folder we want and access it as if it were on a local machine.

This solves the issue.

Thanks.

score 1 · Answer 4 · answered Feb 20 '19 at 02:30

You can use pydrive for that. First, you need to find the ID of your file.

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
listed = drive.ListFile({'q': "title contains '.pkl' and 'root' in parents"}).GetList()
for file in listed:
    print('title {}, id {}'.format(file['title'], file['id']))

You can then load the file using the following code:

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

import io
import pickle
from googleapiclient.http import MediaIoBaseDownload

file_id = 'laggVyWshwcyP6kEI-y_W3P8D26sz'

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = downloader.next_chunk()

downloaded.seek(0)
f = pickle.load(downloaded)

Accessing '.pickle' file in Google Colab

4 Answers4

Linked