How to unzip a file in a specific folder in colaboratory environment after download it?

Question

I've looking for a solution to solve the slow upload speed of images dataset on google colab when i use a connection from GoogleDrive. Using the follow code:

from google.colab import drive

drive.mount('/content/gdrive')

Using this procedure i can upload images and create labels using a my def load_dataset:

'train_path=content/gdrive/MyDrive/Capstone/Enviroment/cell_images/train'

train_files, train_targets = load_dataset(train_path)

But, as i said, it's very slow, especially because my full dataset is composed by 27560 images.

To solve my problem, i've tried to use this solution.

But now, in order to still use my deffunction, after download the .tar file i wanna extract in a specific folder in the colab enviroment. I found this answer but not solve my problem.

Example:

This is the environment with the test.tar already downloaded.

But i wanna extract the files in the tar file, which structure is train/Uninfected ; train/Parasitized, to get this:

content
- cell_images
  - test
    - Parasitized
    - Uninfected
  - train
    - Parasitized
    - Uninfected
  - valid
    - Parasitized
    - Uninfected

To use the path in def function:

train_path = train_path=content/cell_images/train/'

train_files, train_targets = load_dataset(train_path)

test_path = train_path=content/cell_images/test/'

test_files, test_targets = load_dataset(test_path)

valid_path = train_path=content/cell_images/valid/'

valid_files, valid_targets = load_dataset(valid_path)

I tried to use: ! mkdir -p content/cell_images and !tar -xvf 'test.tar' content/cell_images

But it doesn't work.

Does anyone know how to proceed?

Thanks!

score 21 · Accepted Answer · edited Sep 29 '19 at 08:08

21

To extract the files from the tar archiver to the folder content/cell_images use the command-line option -C:

!tar -xvf  'test.tar' -C 'content/cell_images'

Hope this helps!

edited Sep 29 '19 at 08:08

jottbe

4,228
1
15
31

answered Apr 11 '19 at 20:41

user2314737

27,088
20
102
114

Md Hishamur Rahman · Answer 2 · 2020-03-22T09:53:17.090

15

Although late answer, but might help others:

shutil.unpack_archive works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:

import shutil
shutil.unpack_archive("filename", "path_to_extract")

edited Mar 22 '20 at 09:53

answered Mar 22 '20 at 09:23

Md Hishamur Rahman

334
4
11

score 1 · Answer 3 · answered Nov 27 '19 at 14:24

1

Connect to drive,

from google.colab import drive drive.mount('/content/drive')

Check for directory !ls and !pwd

unzip !unzip drive/"My Drive"/images.zip -d destination

answered Nov 27 '19 at 14:24

abdul

526
1
5
10

score 1 · Answer 4 · answered Dec 30 '21 at 13:56

1

!tar -xvf "cord-19_2021-12-20.tar.gz"

as given here also https://colab.research.google.com/github/sudo-ken/compress-decompress-in-Google-Drive/blob/master/Unrar_Unzip_Rar_Zip_in_GDrive.ipynb

answered Dec 30 '21 at 13:56

Shaina Raza

1,474
17
12

score 0 · Answer 5 · answered Nov 28 '19 at 20:45

0

If your current directory is the default directory, /content, you can unzip your folder project like this:

%%bash
mkdir foldername
tar -xvf '/content/foldername.tar' -C '/content/'

%%bash lets you script without using ! at the beginning of each line.

answered Nov 28 '19 at 20:45

EhsanYaghoubi

145
3
14

How to unzip a file in a specific folder in colaboratory environment after download it?

5 Answers5