62

I'm currently doing this, but it's VERY slow since I have several terabytes of data in the bucket:

gsutil du -sh gs://my-bucket-1/

And the same for a sub-folder:

gsutil du -sh gs://my-bucket-1/folder

Is it possible to somehow obtain the total size of a complete bucket (or a sub-folder) elsewhere or in some other fashion which is much faster?

fredrik
  • 9,631
  • 16
  • 72
  • 132

10 Answers10

41

The visibility for google storage here is pretty shitty

The fastest way is actually to pull the stackdriver metrics and look at the total size in bytes: enter image description here

Unfortunately there is practically no filtering you can do in stackdriver. You can't wildcard the bucket name and the almost useless bucket resource labels are NOT aggregate-able in stack driver metrics

Also this is bucket level only- not prefixes

The SD metrics are updated daily so unless you can wait a day you cant use this to get the current size right now

UPDATE: Stack Driver metrics now support user metadata labels so you can label your GCS buckets and aggregate those metrics by custom labels you apply.

Edit

I want to add a word of warning if you are creating monitors off of this metric. There is a really crappy bug with this metric right now.

GCP occasionally has platform issues that cause this metric to stop getting written. And I think it's tenant specific (maybe?) so you also won't see it on their public health status pages. And it seems poorly documented for their internal support staff as well because every time we open a ticket to complain they seem to think we are lying and it takes some back and forth before they even acknowledge its broken.

I think this happens if you have many buckets and something crashes on their end and stops writing metrics to your projects. While it does not happen all the time we see it several times a year.

For example it just happened to us again. This is what I'm seeing in stack driver right now across all our projects: enter image description here

Response from GCP support

Just adding the last response we got from GCP support during this most recent metric outage. I'll add all our buckets were accessible it was just this metric was not being written:

The product team concluded their investigation stating that this was indeed a widespread issue, not tied to your projects only. This internal issue caused unavailability for some GCS buckets, which was affecting the metering systems directly, thus the reason why the "GCS Bucket Total Bytes" metric was not available.

red888
  • 27,709
  • 55
  • 204
  • 392
  • This works!. Thanks for this answer you just saved my hours! – chintan sureliya Dec 09 '19 at 08:09
  • 4
    Source: https://cloud.google.com/storage/docs/getting-bucket-information – dwich Aug 14 '20 at 11:19
  • `You can't wildcard the bucket name` Yes you can... the operators with ~ mean they support RegEx... so to find all buckets that start with video- use `bucket_name =~ video.*` – Ray Foss Mar 08 '21 at 05:26
  • Please note that SD metrics on bucket size and object count are only updated once a day! https://cloud.google.com/monitoring/api/metrics_gcp#storage/storage/object_count – Zaar Hai Jul 26 '21 at 02:22
  • @RayFoss Google has since released MQL which is what is being referred to with regex: https://www.infoq.com/news/2021/01/google-cloud-monitoring-mql/ – red888 Sep 15 '21 at 14:23
25

Unfortunately, no. If you need to know what size the bucket is right now, there's no faster way than what you're doing.

If you need to check on this regularly, you can enable bucket logging. Google Cloud Storage will generate a daily storage log that you can use to check the size of the bucket. If that would be useful, you can read more about it here: https://cloud.google.com/storage/docs/accesslogs#delivery

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • 2
    How many requests does `du` make? One per object, or is it somehow optimised? Good to know from the billing point of view. – Ivan Balashov Dec 16 '16 at 17:25
  • How to view the bucket size in the logs? I enabled logs, but don't know how to find it in the logs. I'd prefer not to download them. I'm guessing there's a search query I can enter on the Logs screen? – androidguy May 04 '18 at 09:42
17

If the daily storage log you get from enabling bucket logging (per Brandon's suggestion) won't work for you, one thing you could do to speed things up is to shard the du request. For example, you could do something like:

gsutil du -s gs://my-bucket-1/a* > a.size &
gsutil du -s gs://my-bucket-1/b* > b.size &
...
gsutil du -s gs://my-bucket-1/z* > z.size &
wait
awk '{sum+=$1} END {print sum}' *.size

(assuming your subfolders are named starting with letters of the English alphabet; if not; you'd need to adjust how you ran the above commands).

Daniel Cukier
  • 11,502
  • 15
  • 68
  • 123
Mike Schwartz
  • 11,511
  • 1
  • 33
  • 36
7

Use the built in dashboard Operations -> Monitoring -> Dashboards -> Cloud Storage

The graph at the bottom shows the bucket size for all buckets, or you can select an individual bucket to drill down.

Note that the metric is only updated once per day.

object size graph

dan carter
  • 4,158
  • 1
  • 33
  • 34
3

Google Console

Platform -> Monitoring -> Dashboard -> Select the bucket

Scroll down can see the object size for that bucket

Chris Catignani
  • 5,040
  • 16
  • 42
  • 49
NAW
  • 31
  • 2
3

With python you can get the size of your bucket as follows:

from google.cloud import storage

storage_client = storage.Client()
blobs = storage_client.list_blobs(bucket_or_name='name_of_your_bucket')

blobs_total_size = 0
for blob in blobs:
    blobs_total_size += blob.size  # size in bytes

blobs_total_size / (1024 ** 3)  # size in GB
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
1

I found that that using the CLI it was frequently timing out. But that my be as I was reviewing a coldline storage.

For a GUI solution. Look at Cloudberry Explorer

GUI view of storage

1
  • to include files in subfolders gsutil ls -l -R gs://${bucket_name}
  • This calculates size of all files in all buckets for bucket_name in $(gcloud storage buckets list "--format=value(name)"); do echo "$bucket_name;$(gsutil ls -l -R gs://${bucket_name})"; done | grep TOTAL | awk '{s+=$4}END{print s/1024/1024/1024/1024}'
Vitek
  • 71
  • 5
0

For me following command helped:

gsutil ls -l gs://{bucket_name}

It then gives output like this after listing all files:

TOTAL: 6442 objects, 143992287936 bytes (134.1 GiB)
Anton Kumpan
  • 316
  • 2
  • 9
  • 1
    Doesn't work, I have I file without subfolder directly at bucket and folders. This command doesn't calculate all the files inside the folders. – Kamal Hossain Oct 13 '22 at 02:58
0

I guess, rendering metric from gcp is better approach then using gsutil for getting bucket size.

#!/bin/bash
PROJECT_ID='<<PROJECT_ID>>'
ACCESS_TOKEN="$(gcloud auth print-access-token)"
CHECK_TIME=10
STARTTIME=$(date --date="${CHECK_TIME} minutes ago" -u +"%Y-%m-%dT%H:%M:%SZ")
ENDTIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
FILTER="$( echo -n 'metric.type="storage.googleapis.com/storage/total_bytes"' | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
START="$( echo -n "${STARTTIME}" | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
END="$( echo -n "${ENDTIME}" | ruby -n -r 'cgi' -e 'print(CGI.escape($_))' )"
DETAILS=$(curl -s -H "Authorization: Bearer ${ACCESS_TOKEN}" \
  "https://monitoring.googleapis.com/v3/projects/${PROJECT_ID}/timeSeries/?filter=${FILTER}&interval.startTime=${START}&interval.endTime=${END}")
for i in $(echo "$DETAILS" | jq -r ".timeSeries[]|[.resource.labels.bucket_name,.resource.labels.location,.metric.labels.storage_class,.points[0].value.doubleValue]|@csv"|sort -t, -n -k4,4nr ); do
  f1=${i%,*}
  f2=${i##*,} 
  size=$(numfmt --to=iec-i --suffix=B --format="%9.2f" $f2)
  echo $f1,$size
done 
Vadiraj k.s
  • 55
  • 1
  • 8