1

I am trying to delete many files from google cloud storage at once.
I am using the following code:

public List<Boolean> deleteObjects(List<String> fileParams) {
      List<BlobId> blobs =
          fileParams.stream()
              .map(
                  file -> {
                    logger.info("deleteObject: {}", file);
                    return BlobId.of(bucketName, file);
                  })
              .collect(Collectors.toList());
      return storage.delete(blobs);
  }

This call takes a very long time - I tried to delete 150k files and it took almost 1 hour.

I would like to run it as "fire and forget".

I saw in the JS example that the api is async by nature:

await storage.bucket(bucketName).file(fileName).delete();

I didn't find such example for Java, either with or without a batch.
I guess I can start a new thread and run it, but I wanted to know if the API supports something like that natively.

Is it possible to run an async command natively by the api?

oshai
  • 14,865
  • 26
  • 84
  • 140

2 Answers2

0

The delete object API call is synchronous (it doesn't return a jobId that you have to poll to know if the operation is done or not). Therefore, the standard library can't implement async call because it's sync.

NodeJS best practice is to create async function when you perform API call. It's a language design not an API behavior. You can do the same in Java, Python and Go, but it's not out of the box, you need to create yourselves the concurrency.

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
0

Guillaume's answer is technically correct in that the underlying HTTP API is synchronous, it sounds like you are looking for a way to make an async call within your code and let the call run in the background.

The com.google.cloud.storage API that you are using does not have this built in. You can always run the call in a background thread or using an Futures as demonstrated in the answers here.

You can also access Google Cloud Storage using S3's Java library, which does have a built in async API. Use the migration guide to use these libraries together. Note GCS does not support S3's Multiple object delete API so you can't use their deleteObjects method.

Note that because the underlying HTTP API is synchronous, your application cannot quit while the delete calls are happening in a background thread.

David
  • 9,288
  • 1
  • 20
  • 52