1

I want to do frequent checks to see if a daily set of files is in a GCS bucket. The files are generated in the format Context_YYYYMMDDHHMMSS.txt.gz and I'd like to be able to check based on the context and the date portion of the timestamp (eg Sales_20190908*.txt.gz).

What's the best way to go about this in Python?

Cam
  • 2,026
  • 3
  • 25
  • 42

3 Answers3

1

I would suggest setting up Cloud Pub/Sub notifications for the bucket. Your application can receive a push notification every time a new object is uploaded to the bucket.

jterrace
  • 64,866
  • 22
  • 157
  • 202
0

This question should help you with pulling file names from a GCS Bucket. This site may also help.

You can get date and time from datetime and turn that into a string.

import datetime
def date_to_string():
    now = datetime.datetime.now()
    date_string = ""
    date_string += "{:04d}".format(now.year)
    date_string += "{:02d}".format(now.month)
    date_string += "{:02d}".format(now.day)
    date_string += "{:02d}".format(now.hour)
    date_string += "{:02d}".format(now.minute)
    date_string += "{:02d}".format(now.second)
    return date_string

And then you can simply check if the filename is in the list of blobs:

if "Context_"+date_string+".txt.gz" in blob_list:
    do_something()
mattrea6
  • 288
  • 1
  • 9
0

Since you want to do frequent checks I would suggest you creating a Cron Job with Cloud Scheduler that triggers a Cloud Function with an HTTP request. Then you will create a function triggered by an HTTP request which will list the objects of the bucket and filter them by name.

Also here you may find more details about object listing.

tzovourn
  • 1,293
  • 8
  • 18