14

I have started using google colab to train neural networks, however the data I have is quite large (4GB and 18GB). I have all this data currently stored in one drive and I don't have enough space on my google drive to transfer these files over.

Is there a way for me to directly access the data from one drive in google colab?

I have tried directly loading the data from my own machine, however I feel this process is too time consuming and my machine really doesn't have enough space to store these files. I have also tried adding download=1 after the ? in the file's hyperlink however this does not download and only displays the hyperlink. While using wget produces a 'ERROR 403: Forbidden.' message.

I would like for the google colab file to download this zipped file and to unzip the data from it in order to preform training.

JoshWilde
  • 141
  • 1
  • 1
  • 3
  • I think this method could save much of your time. First, put all your data files in a zip file ( .rar or .zip ). You can create a private repo in GitHub and then upload the arcive to the repo. Here, you have an option to view the raw file. Open that link. Now, you can open this url with Python in Google Colab. And also extract all the files. – Shubham Panchal Apr 17 '19 at 15:59
  • Thank you, but the problem with this approach is I only have 1GB of space on my GitHub account, whilst I have 1TB of space on one drive. I expect to be using more data in the future so I want a system in place for when I have that data. Hence why its important that the data is collected straight from one drive. – JoshWilde Apr 18 '19 at 16:42
  • @JoshWilde did you succeed in resolving the problem and had the access from Colab to oneDrive please? – E.gh Mar 19 '20 at 15:32

3 Answers3

6

ok, here is the method download to colab, choose file and right-click download button in onedrive but pause it immediately

enter image description here

then go to the download interface, right-click the paused item, and copy the link address enter image description here

!wget --no-check-certificate \
https://public.sn.files.1drv.com/xxx\ 
-O /content/filename.zip

Note: it will invalid in some minutes

yuhang.tao
  • 566
  • 1
  • 5
  • 9
3

You can use OneDriveSDK which available for download in the PyPi index.

First, we will install it in Google Colab using :

!pip install onedrivesdk

The process is too long to be accommodated here. You need to first authenticate yourself and then you can upload/download files easily.

You can authenticate using this code:

import onedrivesdk 

redirect_uri = 'http://localhost:8080/' client_secret = 'your_client_secret' client_id='your_client_id' api_base_url='https://api.onedrive.com/v1.0/' 
scopes=['wl.signin', 'wl.offline_access', 'onedrive.readwrite'] 
http_provider = onedrivesdk.HttpProvider() 
auth_provider = onedrivesdk.AuthProvider( http_provider=http_provider, client_id=client_id, scopes=scopes) 
client = onedrivesdk.OneDriveClient(api_base_url, auth_provider, http_provider) 
auth_url = client.auth_provider.get_auth_url(redirect_uri) 

# Ask for the code 
print('Paste this URL into your browser, approve the app\'s access.') 
print('Copy everything in the address bar after "code=", and paste it below.') print(auth_url) 
code = input('Paste code here: ')  client.auth_provider.authenticate(code, redirect_uri, client_secret)

This will result in a code which you need to paste in your browser and again in the console to authenticate yourself.

You can download an file using :

root_folder = client.item(drive='me', id='root').children.get() 
id_of_file = root_folder[0].id client.item(drive='me', id=id_of_file).download('./path_to_file')
Shubham Panchal
  • 4,061
  • 2
  • 11
  • 36
  • code doesn't work, gives 'raw_input does not exist' error – SantoshGupta7 May 12 '19 at 18:18
  • 2
    get an error when installing onedrivesdk. `ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.` – Joe Sep 19 '19 at 04:34
  • 1
    @Joe try to run !pip install "onedrivesdk<2" i.o. "!pip install onedrivesdk". More details https://github.com/OneDrive/onedrive-sdk-python/issues/166 – infoshoc Oct 16 '20 at 08:21
  • This worked for me too but by pasting the code I get "We're unable to complete your request" to even enable apps access to my computer – ablam Nov 04 '22 at 18:49
1

For download only, to download folders:

  • cliget in Firefox (wget didn't work for me, but curl is fine)
  • curlwget in Chrome (sorry, haven't tried, i don't use Chrome)

With cliget, you just have to install the add-on in firefox, than start a download of the folder. (Don't have to actually finish.) And at the add-ons' icons, click on cliget, than choose curl, and copy(-paste) the created command.

Note: these are not 'safe' methods, probably shouldn't be used with sensitive contents

(Probably other OneDrive folders stay safe, but I'm not sure. Please confirm me.)

To unzip, one can use unzip command.

A year passed since the question, but I leave this here, for others. :)

Edit:

For many small files it seems to be really slow, for some reason. (I'm not sure why.) Also (with OneDrive) it seems that reliable only up to a few (2-3) GBs... :(

fanyul
  • 148
  • 6