25

I have a GCS bucket containing some files in the path

gs://main-bucket/sub-directory-bucket/object1.gz

I would like to programmatically check if the sub-directory bucket contains one specific file. I would like to do this using gsutil.

How could this be done?

activelearner
  • 7,055
  • 20
  • 53
  • 94

6 Answers6

13

If your script allows for non-zero exit codes, then:

#!/bin/bash

file_path=gs://main-bucket/sub-directory-bucket/object1.gz
gsutil -q stat $file_path
status=$?

if [[ $status == 0 ]]; then
  echo "File exists"
else
  echo "File does not exist"
fi

But if your script is set to fail on error, then you can't use exit codes. Here is an alternative solution:

#!/bin/bash
trap 'exit' ERR

file_path=gs://main-bucket/sub-directory-bucket/object1.gz
result=$(gsutil -q stat $file_path || echo 1)
if [[ $result != 1 ]]; then
  echo "File exists"
else
  echo "File does not exist"
fi

Igor-S
  • 655
  • 8
  • 10
12

You can use the gsutil stat command.

jterrace
  • 64,866
  • 22
  • 157
  • 202
  • 1
    Thank you jterrace. I did check out gsutil stat - especially the gsutil -q stat option. It looks perfect for my use case. However, Google says that we can only use gsutil -q stat on objects within the main directory. That is, it will not work for objects contained within sub-directories. Is there any other way to check if a object within a sub-directory exists? Thanks! – activelearner Mar 30 '15 at 23:08
  • 1
    Subdirectories don't really exist. Please see https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork – rein Mar 31 '15 at 00:37
  • @activelearner - it's talking specifically about directories themselves, not the objects inside, e.g. `gsutil stat gs://bucket/dir/subdir/foo.txt` would work fine. I'll file a bug about updating the docs to make it more clear. – jterrace Mar 31 '15 at 02:53
11

Use the gsutil stat command. For accessing the sub-directories with more number of files use wildcards(*).

For example:

gsutil -q stat gs://some-bucket/some-subdir/*; echo $?

In your case:

gsutil -q stat gs://main-bucket/sub-directory-bucket/*; echo $?

Result 0 means exists; 1 means not exists

Nam G VU
  • 33,193
  • 69
  • 233
  • 372
3

There is also gsutil ls (https://cloud.google.com/storage/docs/gsutil/commands/ls)

e.g.

gsutil ls gs://my-bucket/foo.txt

Output is either that same filepath or "CommandException: One or more URLs matched no objects."

starmandeluxe
  • 2,443
  • 3
  • 27
  • 44
1

Simply using the ls command and counting the number of rows of the output.

If 0 then file not there, if 1 the file exists.

file_exists=$(gsutil ls gs://my_bucket/object1.gz | wc -l)

The same could be used for many files of course.

files_number=$(gsutil ls gs://my_bucket/object* | wc -l)
Tycho
  • 79
  • 2
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30012551) – Justin Liu Oct 06 '21 at 20:38
0

If for whatever reason you want to do something depending on the result of that listing (if there are for example parquet files on a directory load a bq table):

gsutil -q stat gs://dir/*.parquet; if [ $? == 0 ]; then bq load ... ; fi

Borja_042
  • 1,071
  • 1
  • 14
  • 26