0

My dataset is in the following form with 3 folders labelled 0, 5, 10 and each has about 200,000 images :

frames_zip :
           ->0
           ->5
           ->10

I have been trying to unzip my folder with the Dataflow API using Bulk Decompress Cloud Storage Files but the folders get unzipped into a single file. I have tried everything in this question as well but nothing is working.

yudhiesh
  • 6,383
  • 3
  • 16
  • 49

1 Answers1

1

There is several way to achieve this.

  • If it's one time, simply
    • create a compute engine,
    • install zip on it
    • Download your file from Storage
    • unzip the file locally
    • upload the uncompressed file and folder architecture to Cloud Storage gsutil -m cp -r ./local-dir gs://myBucket
    • Delete the VM
  • If it's a periodic task to uncompress the file (for example every week)
    • Create a Cloud Build pipeline, with 1 step which perform exactly the same things as before with the VM (install zip, download the zip files, uncompress and send back the uncompressed file).
    • Schedule periodically Cloud Build pipeline with Cloud Scheduler
    • Cloud Build is serverless and you can have up to 1000Gb of local storage

diskSizeGb: Use the diskSizeGb option to request a custom disk size for your build. The maximum size you can request is 1000 GB.

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
  • Worked perfectly I already had the unzipped folder locally I just had to set up gcloud on my laptop and upload it. – yudhiesh Aug 26 '20 at 01:59