70

Is there an option to count the number of files in bucket-folders?

Like:

gsutil ls -count -recursive gs://bucket/folder

Result:   666 files

I just want an total number of files to compare the amount to the sync-folder on my server.

I don't get it in the manual.

blackbishop
  • 30,945
  • 11
  • 55
  • 76
user2811846
  • 703
  • 1
  • 5
  • 4

7 Answers7

118

Newer Approach


gsutil now has a du command. This makes it even easier to get a count:

$ gsutil du gs://pub | wc -l
232

Older Approach


The gsutil ls command with options -l (long listing) and -R (recursive listing) will list the entire bucket recursively and then produce a total count of all objects, both files and directories, at the end:

$ gsutil ls -lR gs://pub
    104413  2011-04-03T20:58:02Z  gs://pub/SomeOfTheTeam.jpg
       172  2012-06-18T21:51:01Z  gs://pub/cloud_storage_storage_schema_v0.json
      1379  2012-06-18T21:51:01Z  gs://pub/cloud_storage_usage_schema_v0.json
   1767691  2013-09-18T07:57:42Z  gs://pub/gsutil.tar.gz
   2445111  2013-09-18T07:57:44Z  gs://pub/gsutil.zip
      1136  2012-07-19T16:01:05Z  gs://pub/gsutil_2.0.ReleaseNotes.txt
... <snipped> ...

gs://pub/apt/pool/main/p/python-socksipy-branch/:
     10372  2013-06-10T22:52:58Z  gs://pub/apt/pool/main/p/python-socksipy-branch/python-socksipy-branch_1.01_all.deb

gs://pub/shakespeare/:
        84  2010-05-07T23:36:25Z  gs://pub/shakespeare/rose.txt
TOTAL: 144 objects, 102723169 bytes (97.96 MB)

If you really just want the total, you can pipe the output to the tail command:

$ gsutil ls -lR gs://pub | tail -n 1
TOTAL: 144 objects, 102723169 bytes (97.96 MB)
Stephen
  • 8,508
  • 12
  • 56
  • 96
jterrace
  • 64,866
  • 22
  • 157
  • 202
  • 4
    Great, thanks ... just a liddle bit slow for 4 mio files .. Is this Operation 1 Call or counted as numbers of bucket elements? ... could become expensive .. :-) – user2811846 Sep 26 '13 at 19:20
  • 1
    It does an object listing on the bucket, and pages through the results, I think 1000 at a time, so it will make N/1000 calls, where N is the number of objects you have. This is a class A operation per the pricing page. – jterrace Sep 26 '13 at 19:39
  • Hello just logged in to say thanks this helped. I was trying to use find but that was not supported so when searching for an alternative stumbled upon your answer. Its been a great help. – Syed Mudabbir Jan 29 '16 at 15:58
  • 2
    the gsutil solution works great in gsutil v 4.15, @jterrace, but only if there are no "subdirectories" in the bucket/path you are listing. If there are subdirectories, du will roll up the size of the files below that directory and print a line to stdout for that directory (making the file count incorrect). Sorry for the late update to an old question. – booleys1012 Mar 14 '16 at 18:06
  • 2
    While `gsutil ls -l`works is there a way in Windows (no tail or ws) to get a summary without needing to list the entire bucket contents – mobcdi Aug 18 '16 at 11:22
  • `du` and `ls` aren't counting as much as `wc -l` is. – dlamblin May 24 '17 at 01:23
  • 1
    @jterrace Great, thanks. It also includes directory as an object and adds to count. Can we somehow only consider files count excluding directories. – Yogesh Patil Oct 25 '17 at 09:42
  • @jterrace looks like du is giving file sizes, not counts! – REdim.Learning May 03 '19 at 12:33
  • @REdim.Learning - yes, but it prints one per line, which is why I pipe to `wc -l` – jterrace May 05 '19 at 16:47
  • @mobcdi If you have Git for Windows, you have Git Bash. Use that. – Miles Erickson Feb 20 '20 at 20:22
  • Clearly GCP is using this to get more money from us. They clearly know the size and count. It should be available in the API. We should not accept less. – nroose May 13 '21 at 04:44
  • 1
    @YogeshPatil A minor trick to ignore the directory itself. `gsutil du gs://folder/* | wc -l` – 刘宇翔 Mar 30 '23 at 02:23
38

If you have the option to not use gsutil, the easiest way is to check it on Google Cloud Platform. Go to Monitoring > Metrics explorer :

  • Resource type : GCS Bucket
  • Metric : Object count Then, in the table below, you have for each bucket the number of document it contains.
Jack
  • 802
  • 6
  • 12
  • 7
    this is an underappreciated answer. – Yevgen Safronov Jan 04 '22 at 14:08
  • 6
    This is WAY faster than using gsutil if you aren't doing something programmatically and you just need the count, AND it doesn't dip into your Class A Operations quota. – ingernet Jan 31 '22 at 17:00
  • 2
    Especially helpful when your bucket has more than a million objects and the total size exceeds a few GBs. – Vishwas M.R Jun 04 '22 at 15:34
  • 3
    Of course, this only works if you want to count the amount of files in the entire bucket. You can't use this to check the amount of files in a specific folder inside the bucket. – Jérémy Jun 15 '22 at 06:44
  • 2
    The downside to this great solution is that the calculation only occurs once per day. This means that any results shown are stale and may not reflect the current story. – Kolban Aug 06 '22 at 19:29
12

You want to gsutil ls -count -recursive in gs://bucket/folder? Alright; gsutil ls gs://bucket/folder/** will list just full urls of the paths to files under gs://bucket/folder without the footer or the lines ending in a colon. Piping that to wc -l will give you the line-count of the result.

gsutil ls gs://bucket/folder/** | wc -l

dlamblin
  • 43,965
  • 20
  • 101
  • 140
  • Why use `**` not just `*`? – northtree Jan 31 '19 at 01:27
  • 2
    @northtree I think in this case it might be equivalent, but ** does work for multiple levels at once, so I think `/folder/**/*.js` would find all js files under any depth of directories after folder (except in folder itself) while `/folder/*/*.js` would only work for js files within a directory in folder. – dlamblin Jan 31 '19 at 06:38
7

As someone that had 4.5M objects in a bucket, I used gsutil du gs://bucket/folder | wc -l which took ~24 min

Kevin Danikowski
  • 4,620
  • 6
  • 41
  • 75
4

This doesn't work recursively, but you can also get the count of a single large folder from the console. This method has the advantage of being very fast.

  1. Select Sort and filter from the filter menu in your bucket. Select sort and filter

  2. Reverse the sort order to let Google Cloud Storage calculate the number of files/folders. Sort by name

  3. View the count of files/folders in the current folder. View the count

ChrisRockGM
  • 368
  • 1
  • 5
  • 23
3
gsutil ls -lR gs://Floder1/Folder2/Folder3/** |tail -n 1
Areza
  • 5,623
  • 7
  • 48
  • 79
Dhiraj
  • 81
  • 1
  • 1
2

This gist shows how to iterate through all Cloud Storage buckets and list the number of objects in each. Compliments of @vinoaj

for VARIABLE in $(gsutil ls)
do
  echo $(gsutil du $VARIABLE | grep -v /$ | wc -l) $VARIABLE
done

To filter buckets, add a grep such as for VARIABLE in $(gsutil ls | grep "^gs://bucketname")

In the console, you can click Activate Cloud Shell in the top right and paste this in to get results. If you save the commands as a bash script, then run chmod u+x program_name so the script can run in the GCP Cloud Shell.

NOTE: When you do gsutil du gs://my-bucket/logs | wc -l the result includes an "extra" result for each bucket and sub-directory. For example, 3 files in a top-level bucket will be 4. 3 files in a sub-directory will be 5.

smoore4
  • 4,520
  • 3
  • 36
  • 55