216

What are the common ways to import private data into Google Colaboratory notebooks? Is it possible to import a non-public Google sheet? You can't read from system files. The introductory docs link to a guide on using BigQuery, but that seems a bit... much.

Vishal Yadav
  • 3,642
  • 3
  • 25
  • 42
Grae
  • 2,308
  • 2
  • 13
  • 10

24 Answers24

252

An official example notebook demonstrating local file upload/download and integration with Drive and sheets is available here: https://colab.research.google.com/notebooks/io.ipynb

The simplest way to share files is to mount your Google Drive.

To do this, run the following in a code cell:

from google.colab import drive
drive.mount('/content/drive')

It will ask you to visit a link to ALLOW "Google Files Stream" to access your drive. After that a long alphanumeric auth code will be shown that needs to be entered in your Colab's notebook.

Afterward, your Drive files will be mounted and you can browse them with the file browser in the side panel.

enter image description here

Here's a full example notebook

Asheet_s
  • 19
  • 7
Bob Smith
  • 36,107
  • 11
  • 98
  • 91
  • 4
    A sheets example is now included in a bundled example notebook that also includes recipes for Drive and Google Cloud Storage: https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb – Bob Smith Oct 30 '17 at 18:03
  • 17
    Can I import a specific folder in my Drive? I'm sharing this colab with someone else, and I don't want to give access to all my google drive which contains sensitive information – fabda01 Sep 12 '18 at 19:10
  • 5
    Files in your Drive won't be shared if you share the notebook. The user will still need to mount their own drive, which is separate. You can share the files with that user if needed, but all of that is controlled by normal Drive ACLs. Sharing a Colab notebook shares only the notebook, not the Drive files referenced in that notebook. – Bob Smith Sep 12 '18 at 20:10
  • 1
    my mount is successful but I can't see the files listing in the left side under files. Any suggestions? – Swapnil B. Oct 30 '18 at 02:36
  • Did you hit 'Refresh' in the file browser? Did you mount under `/content`? – Bob Smith Oct 30 '18 at 04:41
  • 8
    Do not train on the data in mounted google drive. First copy the data to local drive and then train on it. It will be nearly 10 times faster. For faster copy, make sure the data files are big archives or a number of smaller ones. For example:- Do not use 100000 image files. Use 100 archives of 1000 images each. This way uploading to google drive is also faster and so is the copying from google drive to colab – saurabheights Jan 11 '19 at 02:31
  • @BobSmith, I'm trying to install packages on GOOGLE COLAB, but I'm facing errors while doing that. I could import the main module without any error, however, if I try to import submodule(gym_robot), I get an error, ImportError: cannot import name 'gym_robot' from 'gym'. [This my complete notebook](https://colab.research.google.com/drive/1nwOpIlgmIppD5_umHCd_THmnFuCOlvuV#scrollTo=W4VatWmozWPa) – zoraiz ali Nov 05 '21 at 14:09
84

Upload

from google.colab import files
files.upload()

Download

files.download('filename')

List directory

import os
os.listdir()
starriet
  • 2,565
  • 22
  • 23
井上智文
  • 1,905
  • 17
  • 14
34

step 1- Mount your Google Drive to Collaboratory

from google.colab import drive
drive.mount('/content/gdrive')

step 2- Now you will see your Google Drive files in the left pane (file explorer). Right click on the file that you need to import and select çopy path. Then import as usual in pandas, using this copied path.

import pandas as pd
df=pd.read_csv('drive/MyDrive/data.csv')

Done!

codejockie
  • 9,020
  • 4
  • 40
  • 46
Garima Jain
  • 1,247
  • 8
  • 6
  • 3
    Wins on clarity and brevity and has equal effectiveness. I see no advantage to the much more involved ways to do this. – Elroch Feb 07 '20 at 11:57
23

Simple way to import data from your googledrive - doing this save people time (don't know why google just doesn't list this step by step explicitly).

INSTALL AND AUTHENTICATE PYDRIVE

     !pip install -U -q PyDrive ## you will have install for every colab session

     from pydrive.auth import GoogleAuth
     from pydrive.drive import GoogleDrive
     from google.colab import auth
     from oauth2client.client import GoogleCredentials

     # 1. Authenticate and create the PyDrive client.
     auth.authenticate_user()
     gauth = GoogleAuth()
     gauth.credentials = GoogleCredentials.get_application_default()
     drive = GoogleDrive(gauth)

UPLOADING

if you need to upload data from local drive:

    from google.colab import files

    uploaded = files.upload()

    for fn in uploaded.keys():
       print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

execute and this will display a choose file button - find your upload file - click open

After uploading, it will display:

    sample_file.json(text/plain) - 11733 bytes, last modified: x/xx/2018 - %100 done
    User uploaded file "sample_file.json" with length 11733 bytes

CREATE FILE FOR NOTEBOOK

If your data file is already in your gdrive, you can skip to this step.

Now it is in your google drive. Find the file in your google drive and right click. Click get 'shareable link.' You will get a window with:

    https://drive.google.com/open?id=29PGh8XCts3mlMP6zRphvnIcbv27boawn

Copy - '29PGh8XCts3mlMP6zRphvnIcbv27boawn' - that is the file ID.

In your notebook:

    json_import = drive.CreateFile({'id':'29PGh8XCts3mlMP6zRphvnIcbv27boawn'})

    json_import.GetContentFile('sample.json') - 'sample.json' is the file name that will be accessible in the notebook.

IMPORT DATA INTO NOTEBOOK

To import the data you uploaded into the notebook (a json file in this example - how you load will depend on file/data type - .txt,.csv etc. ):

    sample_uploaded_data = json.load(open('sample.json'))

Now you can print to see the data is there:

    print(sample_uploaded_data)
E G
  • 498
  • 6
  • 7
  • 2
    It is worth pointing out that the *UPLOADING* suggestion, via `google.colab.files.upload()` doesn't seem to work on neither Firefox nor Safari, Chrome only. See [here](https://stackoverflow.com/questions/48420759/upload-local-files-using-google-colab) – 5agado Feb 23 '18 at 13:48
9

The Best and easy way to upload data / import data into Google colab GUI way is click on left most 3rd option File menu icon and there you will get upload browser files as you get in windows OS .Check below the images for better easy understanding.After clicking on below two options you will get upload window box easy. work done. enter image description here

from google.colab import files
files=files.upload()
sameer_nubia
  • 721
  • 8
  • 8
8

The simplest way I've made is :

  1. Make repository on github with your dataset
  2. Clone Your repository with ! git clone --recursive [GITHUB LINK REPO]
  3. Find where is your data ( !ls command )
  4. Open file with pandas as You do it in normal jupyter notebook.
Rafał B.
  • 147
  • 1
  • 6
  • Hi, with this gapminder = pd.read_csv("Data-Analysis/pairplots/data/gapminder_data.csv") I am only getting "version https://.." variable with only 2 observatons – Muku Apr 07 '18 at 11:55
  • 2
    This solution will not work out if a single file size is more than github allowed limit which if 20MB i guess in free version. – Akshay Soam Apr 13 '18 at 20:01
8

This allows you to upload your files through Google Drive.

Run the below code (found this somewhere previously but I can't find the source again - credits to whoever wrote it!):

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Click on the first link that comes up which will prompt you to sign in to Google; after that another will appear which will ask for permission to access to your Google Drive.

Then, run this which creates a directory named 'drive', and links your Google Drive to it:

!mkdir -p drive
!google-drive-ocamlfuse drive

If you do a !ls now, there will be a directory drive, and if you do a !ls drive you can see all the contents of your Google Drive.

So for example, if I save my file called abc.txt in a folder called ColabNotebooks in my Google Drive, I can now access it via a path drive/ColabNotebooks/abc.txt

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
yl_low
  • 1,209
  • 2
  • 17
  • 26
7

On the left bar of any colaboratory there is a section called "Files". Upload your files there and use this path

"/content/YourFileName.extension"

ex: pd.read_csv('/content/Forbes2015.csv');

Vivek Solanki
  • 452
  • 1
  • 7
  • 10
6

Quick and easy import from Dropbox:

!pip install dropbox
import dropbox
access_token = 'YOUR_ACCESS_TOKEN_HERE' # https://www.dropbox.com/developers/apps
dbx = dropbox.Dropbox(access_token)

# response = dbx.files_list_folder("")

metadata, res = dbx.files_download('/dataframe.pickle2')

with open('dataframe.pickle2', "wb") as f:
  f.write(res.content)
delica
  • 1,647
  • 13
  • 17
5

Just two lines of code in Colab. Very easy way:

  1. Load all your files in one zip archive to Google Drive.
  2. Make it visible for everyone with a link.
  3. Copy ID from this link. ( For example: In this link https://drive.google.com/open?id=29PGh8XCts3mlMP6zRphvnIcbv27boawn ID is 29PGh8XCts3mlMP6zRphvnIcbv27boawn)
  4. Enter in Colab: !gdown --id 29PGh8XCts3mlMP6zRphvnIcbv27boawn
  5. And last step to enter in Colab: ! unzip file_name.zip

Voilà! All needed files are ready to be used in Colab in /content/file_name.csv

For this easy way to get files from Drive to Colab I thank Gleb Mikhaylov.

DashaSD
  • 51
  • 1
  • 1
4

The simplest solution I have found so far which works perfectly for small to mid-size CSV files is:

  1. Create a secret gist on gist.github.com and upload (or copy-paste the content of) your file.
  2. Click on the Raw view and copy the raw file URL.
  3. Use the copied URL as the file address when you call pandas.read_csv(URL)

This may or may not work for reading a text file line by line or binary files.

Borhan Kazimipour
  • 405
  • 1
  • 6
  • 13
  • 2
    It's important to note that while secret gists are difficult to discover they are _not_ private, so anyone using this approach should be careful. – Grae Jul 13 '18 at 19:05
4

For those who, like me, came from Google for the keyword "upload file colab":

from google.colab import files
uploaded = files.upload()
Fernando Wittmann
  • 1,991
  • 20
  • 16
3
  1. You can mount to google drive by running following

    from google.colab import drive drive.mount('/content/drive')

  2. Afterwards For training copy data from gdrive to colab root folder.

!cp -r '/content/drive/My Drive/Project_data' '/content'

where first path is gdrive path and second is colab root folder.

This way training is faster for large data.

asheer qureshi
  • 71
  • 1
  • 1
  • 4
3

I created a small chunk of code that can do this in multiple ways. You can

  1. Use already uploaded file (useful when restarting kernel)
  2. Use file from Github
  3. Upload file manually
import os.path

filename = "your_file_name.csv"
if os.path.isfile(filename):
  print("File already exists. Will reuse the same ...")
else:
  use_github_data = False  # Set this to True if you want to download from Github
  if use_github_data:
    print("Loading fie from Github ...")
    # Change the link below to the file on the repo
    filename = "https://github.com/ngupta23/repo_name/blob/master/your_file_name.csv" 
  else:
    print("Please upload your file to Colab ...")
    from google.colab import files
    uploaded = files.upload()
Nikhil Gupta
  • 1,436
  • 12
  • 15
2

You can also use my implementations on google.colab and PyDrive at https://github.com/ruelj2/Google_drive which makes it a lot easier.

!pip install - U - q PyDrive  
import os  
os.chdir('/content/')  
!git clone https://github.com/ruelj2/Google_drive.git  

from Google_drive.handle import Google_drive  
Gd = Google_drive()  

Then, if you want to load all files in a Google Drive directory, just

Gd.load_all(local_dir, drive_dir_ID, force=False)  

Or just a specific file with

Gd.load_file(local_dir, file_ID)
Jean-Christophe
  • 485
  • 4
  • 7
  • 1
    In this case what is "drive_dir_ID?" – Parseltongue Dec 07 '18 at 00:31
  • 1
    As mentioned in the git repo, drive_dir_ID is the corresponding Google Drive ID of the requested directory. For more info, please check https://github.com/ruelj2/Google_drive. There is also a clear exemple of usage. – Jean-Christophe Dec 07 '18 at 15:24
2

in google colabs if this is your first time,

from google.colab import drive
drive.mount('/content/drive')

run these codes and go through the outputlink then past the pass-prase to the box

when you copy you can copy as follows, go to file right click and copy the path ***don't forget to remove " /content "

f = open("drive/My Drive/RES/dimeric_force_field/Test/python_read/cropped.pdb", "r")
Hutch
  • 167
  • 2
  • 8
1

As mentioned by @Vivek Solanki, I also uploaded my file on the colaboratory dashboard under "File" section. Just take a note of where the file has been uploaded. For me, train_data = pd.read_csv('/fileName.csv') worked.

Ishani
  • 39
  • 2
1

Another simple way to do it with Dropbox would be:

Put your data into dropbox

Copy the file sharing link of your file

Then do wget in colab.

Eg: ! wget - O filename filelink(like- https://www.dropbox.com/.....)

And you're done. The data will start appearing in your colab content folder.

P S
  • 11
  • 2
0

It has been solved, find details here and please use the function below: https://stackoverflow.com/questions/47212852/how-to-import-and-read-a-shelve-or-numpy-file-in-google-colaboratory/49467113#49467113

from google.colab import files
import zipfile, io, os

    def read_dir_file(case_f):
        # author: yasser mustafa, 21 March 2018  
        # case_f = 0 for uploading one File and case_f = 1 for uploading one Zipped Directory
        uploaded = files.upload()    # to upload a Full Directory, please Zip it first (use WinZip)
        for fn in uploaded.keys():
            name = fn  #.encode('utf-8')
            #print('\nfile after encode', name)
            #name = io.BytesIO(uploaded[name])
        if case_f == 0:    # case of uploading 'One File only'
            print('\n file name: ', name)
            return name
        else:   # case of uploading a directory and its subdirectories and files
            zfile = zipfile.ZipFile(name, 'r')   # unzip the directory 
            zfile.extractall()
            for d in zfile.namelist():   # d = directory
                print('\n main directory name: ', d)
                return d
    print('Done!')
Yasser M
  • 654
  • 7
  • 9
0

Here is one way to import files from google drive to notebooks.

open jupyter notebook and run the below code and do complete the authentication process

!apt-get install -y -qq software-properties-common python-software-properties   module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret=  {creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

once you done with above code , run the below code to mount google drive

!mkdir -p drive
!google-drive-ocamlfuse drive

Importing files from google drive to notebooks (Ex: Colab_Notebooks/db.csv)

lets say your dataset file in Colab_Notebooks folder and its name is db.csv

import pandas as pd
dataset=pd.read_csv("drive/Colab_Notebooks/db.csv")

I hope it helps

Community
  • 1
  • 1
Ravi
  • 2,778
  • 2
  • 20
  • 32
0

if you want to do this without code it's pretty easy. Zip your folder in my case it is

dataset.zip

then in Colab right click on the folder where you want to put this file and press Upload and upload this zip file. After that write this Linux command.

!unzip <your_zip_file_name>

you can see your data is uploaded successfully.

0

If the Data-set size is less the 25mb, The easiest way to upload a CSV file is from your GitHub repository.

  1. Click on the data set in the repository
  2. Click on View Raw button
  3. Copy the link and store it in a variable
  4. load the variable into Pandas read_csv to get the dataframe

Example:

import pandas as pd
url = 'copied_raw_data_link'
df1 = pd.read_csv(url)
df1.head()
Lax
  • 21
  • 1
  • 9
0

You can use the below function. I am assuming that you are trying to upload a data frame sort of file (.csv, .xlsx)

def file_upload():
    file = files.upload()
    path = f"/content/{list(file.keys())[0]}"
    df = pd.read_excel(path)
    return df

#your file will be saved in the variable: dataset
dataset = file_upload()

This is in case you have not changed the directory of the google collab then this is the easiest way

0

from google.colab import drive

drive.mount('/content/drive')

import pandas as pd dv=pd.read_csv('/content/drive/MyDrive/Diana/caso/Data_Caso_Propuesto.csv') dv.info()

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Mátyás Grőger Sep 15 '22 at 13:12