15

Is this something that can be done with gsutil?

https://cloud.google.com/storage/docs/gsutil/commands/ls does not seem to mention any sorting functionality - only filtering by a date - which wouldn't work for my use case.

Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
  • 3
    Possible duplicate of [Google Cloud Storage: How to get list of new files in bucket/folder using gsutil](https://stackoverflow.com/questions/44017463/google-cloud-storage-how-to-get-list-of-new-files-in-bucket-folder-using-gsutil) – jterrace Aug 18 '17 at 16:22
  • That is asking for selecting by a certain date. That seems to be filtering functionality. – Chris Stryczynski Aug 18 '17 at 20:51
  • Where is filter by date? – mathtick Sep 14 '21 at 11:03

4 Answers4

9

Hello this still doesn't seems to exists, but there is a solution in this post: enter link description here

The command used is this one:

gsutil ls -l gs://[bucket-name]/ | sort -k 2

As it allow you to filter by date you can get the most recent result in the bucket and recuperating the last line using another pipe if you need.

night-gold
  • 2,202
  • 2
  • 20
  • 31
  • Just to note, per [GCS docs](https://cloud.google.com/storage/docs/metadata#generation-number), "There is no guarantee that generation numbers increase for successive versions, only that each new version has a unique generation number" – alkalinity Feb 23 '23 at 17:38
3
gsutil ls -l gs://<bucket-name> | sort -k 2 | tail -n 2 | head -1 | cut -d ' ' -f 7

It will not work well if there is less then two objects in the bucket though

1

By using gsutil from a host machine this will populate the response array:

response=(`gsutil ls -l gs://some-bucket-name|sort -k 2|tail -2|head -1`)

Or by gsutil from docker container:

response=(`docker run --name some-container-name --rm --volumes-from gcloud-config -it google/cloud-sdk:latest gsutil ls -l gs://some-bucket-name|sort -k 2|tail -2|head -1`)

Afterwards, to get the whole response, run:

echo ${response[@]}

will print for example:

33 2021-08-11T09:24:55Z gs://some-bucket-name/filename-37.txt

Or to get separate info from the response, (e.g. filename)

echo ${response[2]}

will print the filename only

gs://some-bucket-name/filename-37.txt
Vladimir Djuricic
  • 4,323
  • 1
  • 21
  • 22
0

For my use case, I wanted to find the most recent directory in my bucket. I number them in ascending order (with leading zeros), so all I need to get the most recent one is this:

gsutil ls -l gs://[bucket-name] | sort | tail -n 1 | cut -d '/' -f 4
  1. list the directory
  2. sort alphabetically (probably unnecessary)
  3. take the last line
  4. tokenise it with "/" delimiter
  5. get the 4th token, which is the directory name
Codemonkey
  • 4,455
  • 5
  • 44
  • 76
  • 1
    Read this link regarding sequentially naming objects: https://cloud.google.com/storage/docs/best-practices#naming Avoid using sequential object names such as timestamp-based object names if you are uploading many objects in parallel. Objects with sequential names are stored consecutively, so they are likely to hit the same backend server. When this happens, throughput is constrained. In order to achieve optimal throughput, add the hash of the sequence number as part of the object name to make it non-sequential. – John Hanley Aug 18 '21 at 08:09
  • I've been doing it this way for years with no issues... I have root folders 0001 0002 0003 0004 etc; each of those is limited to 75GB in size; when it fills, I move on to the next one. The filenames WITHIN the folders, are md5 hashes of the file contents, so maybe that's suitable given the wording above? – Codemonkey Aug 18 '21 at 08:12
  • Cloud Storage does not have folders. What you think is a folder is just a prefix that is part of the object name. Buckets are a flat namespace. Unless you need optimum performance, this probably does not matter for you. For customers that require high performance for millions/billions of objects: **Objects with sequential names are stored consecutively, so they are likely to hit the same backend server**. I commented on your answer so that others do not copy your naming scheme without understanding the impact on performance. – John Hanley Aug 18 '21 at 08:19
  • I know that, but I'm using this as a backup of my server. I should have clarified that I meant that's my file structure on the server. – Codemonkey Aug 18 '21 at 09:49
  • I am not trying to inform you. I commenting for future readers of your answer. – John Hanley Aug 18 '21 at 18:20