0

Let's say there are 10 folders in my bucket. I want to split the contents of the folders in a ratio of 0.8,0.1,0.1 and move them to three new folders Train, Test and Val. I have earlier done this process by downloading the folders, splitting and uploading them again. I now want to split he folders in the bucket itself.

I was able to connect to the bucket using "google-cloud-storage" library from Notebook using the post here. I was able to download, upload files. I'm not sure how to achieve splitting the folders without downloading the content.

Appreciate the help.

PS: I don't need the full code, just how to approach will do

red1234
  • 1
  • 1

1 Answers1

1

With Cloud Storage you can only READ, WRITE (CREATE/DELETE). You can't move blob inside the bucket, even if the operation exists in the console or in some client library, the move is a WRITE/CREATE of the content with another path and then a WRITE/DELETE of the previous path.

Thus, your strategy must follow the same logic:

  • Perform a gsutil ls to list all the files
  • Copy (or move) 80% in one directory, 10% and 10% in the 2 others directory
  • Delete the old directory (useless if you used move operation).

It's quicker than downloading and uploading files, but it takes time. Because it's not a file system, but only API calls, it takes time for each files. And if you have thousands of file, it can take hours!

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76