63

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))

But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.

I tried to extract it further, but getting "Not a zipfile error"

dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()

Google Drive Dataset

Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Laxmikant
  • 2,046
  • 3
  • 30
  • 44

16 Answers16

114

You can simply use this

!unzip file_location
Zeeshan Ali
  • 137
  • 1
  • 10
Harsh Gupta
  • 1,433
  • 1
  • 10
  • 7
90

TO unzip a file to a directory:

!unzip path_to_file.zip -d path_to_directory
giapnh
  • 2,950
  • 24
  • 20
  • Works great. Fast. Example !unzip drive/MyDrive/feedback-prize-2021/train.zip -d drive/MyDrive/feedback-prize-2021/train/ – Sahar Millis Feb 12 '23 at 10:39
28

To extract Google Drive zip from a Google colab notebook:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
Alon Lavian
  • 1,149
  • 13
  • 14
20

Colab research team has a notebook for helping you out.

Still, in short, if you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within drive then do this --

!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"

-u part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.

-d creates the directory and extracted files are stored there.

Of course before doing this you need to mount your drive

from google.colab import drive 
drive.mount('/content/drive')

I hope this helps! Cheers!!

Suvo
  • 1,318
  • 11
  • 13
9

First, install unzip on colab:

!apt install unzip

then use unzip to extract your files:

!unzip  source.zip -d destination.zip
  • what if the file is protected with a password? I tried to add -pPassword but it does not work, I found this error "caution: filename not matched: MyPassword" – Naham Al-Zuhairi Apr 17 '23 at 17:33
6

Mount GDrive:

from google.colab import drive
drive.mount('/content/gdrive')

Open the link -> copy authorization code -> paste that into the prompt and press "Enter"

Check GDrive access:

!ls "/content/gdrive/My Drive"

Unzip (q stands for "quiet") file from GDrive:

!unzip -q "/content/gdrive/My Drive/dataset.zip"
Community
  • 1
  • 1
Plo_Koon
  • 2,953
  • 3
  • 34
  • 41
  • 1
    for unzip isn't it I/O-wise better to first copy the zip file to local Colab storage then do the unzip operation? – Farzan Feb 22 '20 at 18:51
6

First create a new directory:

!mkdir file_destination

Now, it's the time to inflate the directory with the unzipped files with this:

!unzip file_location -d file_destination
sargupta
  • 953
  • 13
  • 25
5

For Python

Connect to drive,

from google.colab import drive
drive.mount('/content/drive')

Check for directory

!ls and !pwd

For unzip

!unzip drive/"My Drive"/images.zip
Krunal Kapadiya
  • 2,853
  • 3
  • 16
  • 37
abdul
  • 526
  • 1
  • 5
  • 10
5

Please use this command in google colab

Unzip the file you want to extract and then the location

!unzip "drive/My Drive/Project/yourfilename.zip" -d "drive/My Drive/Project/yourfolder"
MD Mushfirat Mohaimin
  • 1,966
  • 3
  • 10
  • 22
Floyd Fernandes
  • 147
  • 1
  • 4
4

After mounting on drive, use shutil.unpack_archive. It works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:

import shutil
shutil.unpack_archive("filename", "path_to_extract")
2

This is what worked for me.

!apt install unzip

Then I used this code to unzip the file

!unzip /content/file.zip -d /content/

Without installing unzip on Colab first you'll always receive error messages.

1

Instead of GetContentString(), use GetContentFile() instead. It will save the file instead of returning the string.

downloaded.GetContentFile('images.zip') 

Then you can unzip it later with unzip.

korakot
  • 37,818
  • 16
  • 123
  • 144
  • It seemed to work, but was too fast perhaps. But then unzip didn't work. Archive: flowers.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of flowers or flowers.zip, and cannot find flowers.ZIP, period. – Rafael_Espericueta Jul 22 '19 at 22:27
1

SIMPLE WAY TO CONNECT

1) You'll have to verify authentication

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()

2)To fuse google drive

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

3)To verify credentials

import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

4)Create a drive name to use it in colab ('gdrive') and check if it's working

!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive
Vaishnavi Bala
  • 129
  • 1
  • 4
0

Try this:

!unpack file.zip

If its now working or file is 7z try below

!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar

Or

!pip install pyunpack
!pip install patool

from pyunpack import Archive
Archive(‘file_name.tar.7z’).extractall(‘path/to/’)
!tar -xvf file_name.tar
0

We assumed that you're already mounting your googleDrive to googleColab. in case if you just want to extract zip file thats contains .csv extention. just call pandas attribute read_csv

pd.read_csv('/content/drive/My Drive/folder/example.zip')
-2

in my idea, you must go to a certain path for example:

from google.colab import drive drive.mount('/content/drive/') cd drive/MyDrive/f/

then :

!apt install unzip !unzip zip_folder.zip -d unzip_folder enter image description here