0

is there any ASYNC way to check if a file object exist in GCP Storage using Python?

i need to check if a batch of files exist in gcp storage, but the method described in the documentation is blocking my application and execution time is to late, i want to make the app check the files more faster than now

Sathi Aiswarya
  • 2,068
  • 2
  • 11
  • I think there are no async methods for the GCS client. Maybe the best would be checking this in a another thread to avoid blocking the whole app – Puteri Dec 28 '22 at 03:37
  • Do you know the exact name and path of your object in GCS? – guillaume blaquiere Dec 28 '22 at 13:14
  • you can filter by the path prefix, Lists all the files in the bucket that begin with the prefix. Also check this stackoverflow [link](https://stackoverflow.com/questions/13525482/how-to-check-if-file-exists-in-google-cloud-storage) – Sathi Aiswarya Dec 28 '22 at 13:27
  • packages like "celery" or "ray" help by making lightweight tasks that can do multiple checks simultaneously, or close enough. However there are always going to be API limits that you will hit, sooner then you imagine, so this is not generally the way to go as at _some_ point you will hit such a limit and then what? So accept the fact it takes N time and give the user e.g. a progress bar instead as some worker does that work and reports back when it's done. It'll be a much more complex then your current setup of course, so it's a trade off. – Paul Collingwood Dec 28 '22 at 17:13
  • @Ferregina I've been thinking about it, although it's so hard to deal with errors, so far this seems to be the best way. – wedrano de carvalho Dec 28 '22 at 21:30
  • @guillaumeblaquiere yes – wedrano de carvalho Dec 28 '22 at 21:31
  • @SathiAiswarya I can't do that because when the file doesn't exist in gcs I will upload it. so when there are many files in the storage, it will cost me a lot of money, because the api to list the blobs is more expensive – wedrano de carvalho Dec 28 '22 at 21:34
  • @PaulCollingwood yaa I'm already dealing with rate limits in several parts of my code hahah, sad but it has to be like this, the problem with using celery is that I depend on the response to proceed with the execution, I can't just delegate to another task. It's a huge refactoring ahead if I go down this path – wedrano de carvalho Dec 28 '22 at 21:44
  • I found this module to do asynchronous uploads and downloads ( gcloud-aio-storage ), I'm already using it, I think I'll try to download the files with it and if there's an error, I'll assume that the file doesn't exist :/ – wedrano de carvalho Dec 28 '22 at 21:53
  • from the above comment I assume you found the module that works for your use case, maybe you can post the same as an answer so other members who are facing this similar issue are helped out. – Sathi Aiswarya Dec 29 '22 at 06:57
  • If you know the full path/name of your object, perform a simple get on it. You will have immediately the answer if it is present or not. Of course, the immediate will take 50 - 100ms, the usual API call duration. Don't know if it's too much or not for your realtime use case. – guillaume blaquiere Dec 29 '22 at 09:45

1 Answers1

0

I found this module (gcloud-aio-storage) to make asynchronous uploads and downloads in the storage, I use its download function and if it raises an error I consider that the file does not exist.

another way suggested by Guillaume blaquiere is to make the http calls directly in the storage api if you know your full file path, without using the gcp libraries for python, you can find the documentation here storage api endpoints